AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

China drafts law regulating 'digital humans' and banning addictive virtual services for children

A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

Generative Ai

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Abstract page for arXiv paper 2512.00408: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

arXiv - AI · 3 min · about 6 hours ago

Llms

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

arXiv - AI · 4 min · about 6 hours ago

All Content

Llms

[2602.15763] GLM-5: from Vibe Coding to Agentic Engineering

GLM-5 introduces a next-generation foundation model that enhances coding capabilities through agentic engineering, reducing costs while i...

arXiv - Machine Learning · 5 min · about 2 months ago

Machine Learning

[2602.15676] Relative Geometry of Neural Forecasters: Linking Accuracy and Alignment in Learned Latent Geometry

This paper explores how neural networks represent latent geometry in forecasting complex dynamical systems, linking model alignment with ...

arXiv - AI · 3 min · about 2 months ago

Ai Safety

[2602.15637] The Stationarity Bias: Stratified Stress-Testing for Time-Series Imputation in Regulated Dynamical Systems

The paper discusses the 'Stationarity Bias' in time-series imputation, proposing a 'Stratified Stress-Test' to evaluate methods under dif...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.15602] Certified Per-Instance Unlearning Using Individual Sensitivity Bounds

This article presents a novel approach to certified machine unlearning through adaptive per-instance noise calibration, significantly red...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.15586] Uniform error bounds for quantized dynamical models

This paper presents uniform error bounds for quantized dynamical models, providing statistical guarantees on their accuracy when learned ...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.15571] Accelerated Predictive Coding Networks via Direct Kolen-Pollack Feedback Alignment

The paper introduces Direct Kolen-Pollack Predictive Coding (DKP-PC), an innovative approach that enhances the efficiency of predictive c...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.15515] The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes

The paper explores how AI models can learn to obfuscate deception when trained against white-box deception detectors, introducing a taxon...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.15503] Approximation Theory for Lipschitz Continuous Transformers

This paper explores the approximation theory for Lipschitz continuous Transformers, establishing a theoretical foundation for their stabi...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.15481] LLM-as-Judge on a Budget

The paper presents a novel approach to efficiently evaluate large language models (LLMs) under budget constraints, utilizing multi-armed ...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.15499] ExLipBaB: Exact Lipschitz Constant Computation for Piecewise Linear Neural Networks

The paper presents ExLipBaB, a method for exact computation of Lipschitz constants in piecewise linear neural networks, addressing limita...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.15438] Logit Distance Bounds Representational Similarity

This paper explores the relationship between logit distance and representational similarity in discriminative models, demonstrating that ...

arXiv - AI · 4 min · about 2 months ago

Nlp

[2602.15407] Fairness over Equality: Correcting Social Incentives in Asymmetric Sequential Social Dilemmas

This paper explores how asymmetric conditions in Sequential Social Dilemmas affect cooperation dynamics in Multi-Agent Reinforcement Lear...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.15344] ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models

The paper presents ER-MIA, a framework for black-box adversarial memory injection attacks on long-term memory-augmented large language mo...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.15338] Discovering Implicit Large Language Model Alignment Objectives

This article presents a framework called Obj-Disco, which identifies implicit alignment objectives in large language models (LLMs) to enh...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.15304] Hybrid Federated and Split Learning for Privacy Preserving Clinical Prediction and Treatment Optimization

This article presents a hybrid framework combining Federated Learning and Split Learning to enhance privacy in clinical decision-making w...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.15283] Complex-Valued Unitary Representations as Classification Heads for Improved Uncertainty Quantification in Deep Neural Networks

This paper introduces a novel classification head architecture using complex-valued unitary representations to enhance uncertainty quanti...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.15238] Closing the Distribution Gap in Adversarial Training for LLMs

This article discusses a novel approach to adversarial training for large language models (LLMs), proposing Distributional Adversarial Tr...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.15222] Automatically Finding Reward Model Biases

This article presents a novel approach to identifying biases in reward models used in large language models (LLMs), highlighting the pote...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.15206] MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference

The paper presents MAVRL, a novel approach for learning reward functions from multiple feedback types using amortized variational inferen...

arXiv - AI · 4 min · about 2 months ago

Robotics

[2602.15076] Near-Optimal Sample Complexity for Online Constrained MDPs

This paper presents a model-based primal-dual algorithm for online constrained Markov Decision Processes (CMDPs), achieving near-optimal ...

arXiv - Machine Learning · 4 min · about 2 months ago

Previous Page 99 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

China drafts law regulating 'digital humans' and banning addictive virtual services for children

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

All Content

[2602.15763] GLM-5: from Vibe Coding to Agentic Engineering

[2602.15676] Relative Geometry of Neural Forecasters: Linking Accuracy and Alignment in Learned Latent Geometry

[2602.15637] The Stationarity Bias: Stratified Stress-Testing for Time-Series Imputation in Regulated Dynamical Systems

[2602.15602] Certified Per-Instance Unlearning Using Individual Sensitivity Bounds

[2602.15586] Uniform error bounds for quantized dynamical models

[2602.15571] Accelerated Predictive Coding Networks via Direct Kolen-Pollack Feedback Alignment

[2602.15515] The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes

[2602.15503] Approximation Theory for Lipschitz Continuous Transformers

[2602.15481] LLM-as-Judge on a Budget

[2602.15499] ExLipBaB: Exact Lipschitz Constant Computation for Piecewise Linear Neural Networks

[2602.15438] Logit Distance Bounds Representational Similarity

[2602.15407] Fairness over Equality: Correcting Social Incentives in Asymmetric Sequential Social Dilemmas

[2602.15344] ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models

[2602.15338] Discovering Implicit Large Language Model Alignment Objectives

[2602.15304] Hybrid Federated and Split Learning for Privacy Preserving Clinical Prediction and Treatment Optimization

[2602.15283] Complex-Valued Unitary Representations as Classification Heads for Improved Uncertainty Quantification in Deep Neural Networks

[2602.15238] Closing the Distribution Gap in Adversarial Training for LLMs

[2602.15222] Automatically Finding Reward Model Biases

[2602.15206] MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference

[2602.15076] Near-Optimal Sample Complexity for Online Constrained MDPs

Related Topics

Stay updated with AI News