AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Ai Safety

China drafts law regulating 'digital humans' and banning addictive virtual services for children

A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...

Reddit - Artificial Intelligence · 1 min · about 14 hours ago

Generative Ai

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Abstract page for arXiv paper 2512.00408: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

arXiv - AI · 3 min · about 15 hours ago

All Content

Llms

[2601.19245] Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection

This paper introduces SpikeScore, a novel method for detecting hallucinations in multi-turn dialogues across different domains, enhancing...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2508.13415] MAVIS: Multi-Objective Alignment via Inference-Time Value-Guided Selection

The paper introduces MAVIS, a framework for aligning large language models (LLMs) to multiple objectives at inference time, enhancing fle...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Safety

[2601.08005] Internal Deployment Gaps in AI Regulation

This article examines the regulatory gaps in AI deployment within organizations, highlighting issues that allow internal systems to evade...

arXiv - AI · 3 min · about 2 months ago

Llms

[2601.05525] Explainable AI: Learning from the Learners

This article discusses the importance of explainable AI (XAI) in enhancing trust and accountability in AI applications, particularly in s...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2508.07428] Lightning Prediction under Uncertainty: DeepLight with Hazy Loss

The paper presents DeepLight, a novel deep learning architecture designed for predicting lightning occurrences by addressing the limitati...

arXiv - AI · 4 min · about 2 months ago

Llms

[2508.06361] Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts

This article investigates the phenomenon of self-initiated deception in Large Language Models (LLMs) when responding to benign prompts, h...

arXiv - AI · 4 min · about 2 months ago

Llms

[2512.19027] Recontextualization Mitigates Specification Gaming without Modifying the Specification

The paper discusses a novel approach called recontextualization, which aims to reduce specification gaming in language models without alt...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2511.14853] Uncertainty-Aware Measurement of Scenario Suite Representativeness for Autonomous Systems

This paper presents a probabilistic method to measure the representativeness of scenario suites for autonomous systems, focusing on ensur...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2506.15715] NeuronSeek: On Stability and Expressivity of Task-driven Neurons

The paper introduces NeuronSeek, a framework that enhances the stability and expressivity of task-driven neurons in deep learning through...

arXiv - AI · 3 min · about 2 months ago

Llms

[2510.20102] Human-Centered LLM-Agent System for Detecting Anomalous Digital Asset Transactions

The paper presents HCLA, a human-centered multi-agent system designed for detecting anomalies in digital asset transactions, enhancing in...

arXiv - AI · 4 min · about 2 months ago

Llms

[2506.13593] Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs

This paper introduces a novel safety measure, time-to-unsafe-sampling, for evaluating generative models, focusing on predicting unsafe ou...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2510.10193] SAFER: Risk-Constrained Sample-then-Filter in Large Language Models

The paper presents SAFER, a two-stage risk control framework for large language models (LLMs) that enhances output trustworthiness in ris...

arXiv - AI · 4 min · about 2 months ago

Ai Agents

[2508.13213] AI sustains higher strategic tension than humans in chess

This article examines how AI maintains higher strategic tension in chess compared to human players, revealing insights into decision-maki...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2505.15008] Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

The paper discusses optimal selective classification using likelihood ratios, enhancing predictive model reliability by allowing abstenti...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2508.08500] Large Language Models as Oracles for Ontology Alignment

This article explores the use of Large Language Models (LLMs) as tools for improving ontology alignment, demonstrating their effectivenes...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2505.11771] Residual Feature Integration is Sufficient to Prevent Negative Transfer

This paper presents a novel approach to prevent negative transfer in transfer learning by integrating residual features from pretrained m...

arXiv - AI · 4 min · about 2 months ago

Llms

[2506.02873] It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics

This article evaluates the persuasive capabilities of frontier large language models (LLMs) on harmful topics, introducing a new benchmar...

arXiv - AI · 4 min · about 2 months ago

Llms

[2505.11839] On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study

This paper explores the capabilities of large language models (LLMs) in counterfactual reasoning through a decompositional approach, iden...

arXiv - AI · 3 min · about 2 months ago

Llms

[2503.08796] Robust Multi-Objective Controlled Decoding of Large Language Models

This article presents Robust Multi-Objective Decoding (RMOD), an innovative algorithm designed to enhance the performance of Large Langua...

arXiv - AI · 3 min · about 2 months ago

Llms

[2502.14560] Less is More: Improving LLM Alignment via Preference Data Selection

This article discusses a novel approach to improving large language model (LLM) alignment through effective preference data selection, en...

arXiv - AI · 4 min · about 2 months ago

Previous Page 104 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

China drafts law regulating 'digital humans' and banning addictive virtual services for children

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

All Content

[2601.19245] Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection

[2508.13415] MAVIS: Multi-Objective Alignment via Inference-Time Value-Guided Selection

[2601.08005] Internal Deployment Gaps in AI Regulation

[2601.05525] Explainable AI: Learning from the Learners

[2508.07428] Lightning Prediction under Uncertainty: DeepLight with Hazy Loss

[2508.06361] Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts

[2512.19027] Recontextualization Mitigates Specification Gaming without Modifying the Specification

[2511.14853] Uncertainty-Aware Measurement of Scenario Suite Representativeness for Autonomous Systems

[2506.15715] NeuronSeek: On Stability and Expressivity of Task-driven Neurons

[2510.20102] Human-Centered LLM-Agent System for Detecting Anomalous Digital Asset Transactions

[2506.13593] Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs

[2510.10193] SAFER: Risk-Constrained Sample-then-Filter in Large Language Models

[2508.13213] AI sustains higher strategic tension than humans in chess

[2505.15008] Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

[2508.08500] Large Language Models as Oracles for Ontology Alignment

[2505.11771] Residual Feature Integration is Sufficient to Prevent Negative Transfer

[2506.02873] It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics

[2505.11839] On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study

[2503.08796] Robust Multi-Objective Controlled Decoding of Large Language Models

[2502.14560] Less is More: Improving LLM Alignment via Preference Data Selection

Related Topics

Stay updated with AI News