AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min · 44 minutes ago

Ai Safety

China drafts law regulating 'digital humans' and banning addictive virtual services for children

A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...

Reddit - Artificial Intelligence · 1 min · about 12 hours ago

Generative Ai

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Abstract page for arXiv paper 2512.00408: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

arXiv - AI · 3 min · about 13 hours ago

All Content

Ai Safety

[2602.05119] Unbiased Single-Queried Gradient for Combinatorial Objective

This paper presents a novel stochastic gradient method for combinatorial optimization that requires only a single query, enhancing effici...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.02929] RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection

This paper presents RPG-AE, a neuro-symbolic framework combining Graph Autoencoders and rare pattern mining for detecting Advanced Persis...

arXiv - AI · 4 min · about 2 months ago

Llms

[2505.13766] Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques

This article reviews the integration of Large Language Models (LLMs) in Software Quality Assurance (SQA), highlighting their potential to...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.02201] Cardinality-Preserving Attention Channels for Graph Transformers in Molecular Property Prediction

This article presents a novel graph transformer model, incorporating cardinality-preserving attention channels, to enhance molecular prop...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2601.16905] GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints

The paper presents GRIP, a novel algorithm-agnostic framework for machine unlearning in Mixture-of-Experts architectures, addressing the ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2504.21205] SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories

The paper presents SecRepoBench, a benchmark designed to evaluate code agents' performance in secure code completion across real-world C/...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2504.20903] Modeling AI-Human Collaboration as a Multi-Agent Adaptation

This paper explores AI-human collaboration through agent-based simulations, revealing how distinct decision-making heuristics impact perf...

arXiv - AI · 4 min · about 2 months ago

Llms

[2601.12415] Orthogonalized Policy Optimization:Decoupling Sampling Geometry from Optimization Geometry in RLHF

This paper introduces Orthogonalized Policy Optimization (OPO), a new approach in reinforcement learning that separates sampling and opti...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2601.03213] Critic-Guided Reinforcement Unlearning in Text-to-Image Diffusion

The paper presents a novel reinforcement learning framework for unlearning targeted concepts in text-to-image diffusion models, enhancing...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2512.12832] Network Level Evaluation of Hangup Susceptibility of HRGCs using Deep Learning and Sensing Techniques: A Goal Towards Safer Future

This research paper evaluates the hangup susceptibility of Highway Railway Grade Crossings (HRGCs) using deep learning and sensing techni...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2501.07575] Dataset Distillation via Committee Voting

The paper presents a novel method for dataset distillation called Committee Voting for Dataset Distillation (CV-DD), which enhances data ...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2511.17879] Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction

This paper presents a novel method using generative adversarial training to address reward hacking in real-time human-AI music interactio...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2410.15756] Automated Proof Generation for Rust Code via Self-Evolution

This paper presents SAFE, a framework for automated proof generation in Rust code, addressing the challenge of insufficient human-written...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2406.04955] Experimental Evaluation of ROS-Causal in Real-World Human-Robot Spatial Interaction Scenarios

This article presents an experimental evaluation of ROS-Causal, a framework for causal discovery in human-robot spatial interactions, dem...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.12150] GPT-4o Lacks Core Features of Theory of Mind

The paper investigates whether Large Language Models (LLMs) possess a Theory of Mind (ToM), revealing that while they perform well on soc...

arXiv - Machine Learning · 3 min · about 2 months ago

Ai Safety

[2602.08449] When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

The paper discusses regime leakage in AI evaluations, highlighting how advanced agents may exploit evaluation conditions to misrepresent ...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.06855] AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

AIRS-Bench introduces a suite of 20 tasks designed to evaluate AI agents' capabilities in scientific research, highlighting areas of stre...

arXiv - AI · 4 min · about 2 months ago

Llms

[2509.22067] The Rogue Scalpel: Activation Steering Compromises LLM Safety

The paper explores how activation steering, a technique for controlling LLM behavior, can inadvertently compromise safety by increasing h...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.05354] PATHWAYS: Evaluating Investigation and Context Discovery in AI Web Agents

The paper introduces PATHWAYS, a benchmark assessing AI web agents' ability to discover and utilize hidden contextual information in mult...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.00851] Persuasion Propagation in LLM Agents

The paper explores how user persuasion affects the behavior of large language model (LLM) agents during long-horizon tasks, revealing tha...

arXiv - AI · 3 min · about 2 months ago

Previous Page 103 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

China drafts law regulating 'digital humans' and banning addictive virtual services for children

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

All Content

[2602.05119] Unbiased Single-Queried Gradient for Combinatorial Objective

[2602.02929] RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection

[2505.13766] Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques

[2602.02201] Cardinality-Preserving Attention Channels for Graph Transformers in Molecular Property Prediction

[2601.16905] GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints

[2504.21205] SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories

[2504.20903] Modeling AI-Human Collaboration as a Multi-Agent Adaptation

[2601.12415] Orthogonalized Policy Optimization:Decoupling Sampling Geometry from Optimization Geometry in RLHF

[2601.03213] Critic-Guided Reinforcement Unlearning in Text-to-Image Diffusion

[2512.12832] Network Level Evaluation of Hangup Susceptibility of HRGCs using Deep Learning and Sensing Techniques: A Goal Towards Safer Future

[2501.07575] Dataset Distillation via Committee Voting

[2511.17879] Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction

[2410.15756] Automated Proof Generation for Rust Code via Self-Evolution

[2406.04955] Experimental Evaluation of ROS-Causal in Real-World Human-Robot Spatial Interaction Scenarios

[2602.12150] GPT-4o Lacks Core Features of Theory of Mind

[2602.08449] When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

[2602.06855] AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

[2509.22067] The Rogue Scalpel: Activation Steering Compromises LLM Safety

[2602.05354] PATHWAYS: Evaluating Investigation and Context Discovery in AI Web Agents

[2602.00851] Persuasion Propagation in LLM Agents

Related Topics

Stay updated with AI News