AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Abstract page for arXiv paper 2511.21331: The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

arXiv - AI · 4 min · about 23 hours ago

Llms

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

Abstract page for arXiv paper 2509.22367: What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv - AI · 4 min · about 23 hours ago

Machine Learning

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Abstract page for arXiv paper 2507.22264: SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

arXiv - AI · 4 min · about 23 hours ago

All Content

Llms

[2602.15391] Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

The paper presents a novel adaptive abstention system for Large Language Models (LLMs) that balances safety and utility by dynamically ad...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.15384] World-Model-Augmented Web Agents with Action Correction

The paper presents WAC, a web agent that enhances task execution by integrating model collaboration, consequence simulation, and action r...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.15161] Exploiting Layer-Specific Vulnerabilities to Backdoor Attack in Federated Learning

This paper presents the Layer Smoothing Attack (LSA), a novel backdoor attack exploiting layer-specific vulnerabilities in federated lear...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.15298] X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection

The paper presents X-MAP, a framework for analyzing and profiling misclassifications in spam and phishing detection, enhancing interpreta...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.15274] When Remembering and Planning are Worth it: Navigating under Change

This article explores how various memory types enhance spatial navigation in changing environments, highlighting the efficiency of agents...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.15173] Mind the (DH) Gap! A Contrast in Risky Choices Between Reasoning and Conversational LLMs

This paper examines the decision-making behaviors of large language models (LLMs) under uncertainty, contrasting reasoning models with co...

arXiv - AI · 3 min · about 2 months ago

Ai Agents

[2602.15212] Secure and Energy-Efficient Wireless Agentic AI Networks

This paper presents a secure and energy-efficient wireless AI network that utilizes a supervisor AI agent to optimize reasoning tasks whi...

arXiv - AI · 4 min · about 2 months ago

Ai Agents

[2602.15158] da Costa and Tarski meet Goguen and Carnap: a novel approach for ontological heterogeneity based on consequence systems

This paper introduces a novel approach to ontological heterogeneity, integrating concepts from Carnapian-Goguenism and consequence system...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.15143] Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

This paper explores methods to protect language models from unauthorized knowledge distillation by modifying reasoning traces, focusing o...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.15829] Operationalising the Superficial Alignment Hypothesis via Task Complexity

This article presents a new metric called task complexity to operationalize the Superficial Alignment Hypothesis, demonstrating how pre-t...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.15799] The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety

This paper explores how fine-tuning language models can inadvertently degrade safety measures, revealing structural vulnerabilities in al...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.15763] GLM-5: from Vibe Coding to Agentic Engineering

GLM-5 introduces a next-generation foundation model that enhances coding capabilities through agentic engineering, reducing costs while i...

arXiv - Machine Learning · 5 min · about 2 months ago

Machine Learning

[2602.15676] Relative Geometry of Neural Forecasters: Linking Accuracy and Alignment in Learned Latent Geometry

This paper explores how neural networks represent latent geometry in forecasting complex dynamical systems, linking model alignment with ...

arXiv - AI · 3 min · about 2 months ago

Ai Safety

[2602.15637] The Stationarity Bias: Stratified Stress-Testing for Time-Series Imputation in Regulated Dynamical Systems

The paper discusses the 'Stationarity Bias' in time-series imputation, proposing a 'Stratified Stress-Test' to evaluate methods under dif...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.15602] Certified Per-Instance Unlearning Using Individual Sensitivity Bounds

This article presents a novel approach to certified machine unlearning through adaptive per-instance noise calibration, significantly red...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.15586] Uniform error bounds for quantized dynamical models

This paper presents uniform error bounds for quantized dynamical models, providing statistical guarantees on their accuracy when learned ...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.15571] Accelerated Predictive Coding Networks via Direct Kolen-Pollack Feedback Alignment

The paper introduces Direct Kolen-Pollack Predictive Coding (DKP-PC), an innovative approach that enhances the efficiency of predictive c...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.15515] The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes

The paper explores how AI models can learn to obfuscate deception when trained against white-box deception detectors, introducing a taxon...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.15503] Approximation Theory for Lipschitz Continuous Transformers

This paper explores the approximation theory for Lipschitz continuous Transformers, establishing a theoretical foundation for their stabi...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.15481] LLM-as-Judge on a Budget

The paper presents a novel approach to efficiently evaluate large language models (LLMs) under budget constraints, utilizing multi-armed ...

arXiv - Machine Learning · 3 min · about 2 months ago

Previous Page 95 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

All Content

[2602.15391] Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

[2602.15384] World-Model-Augmented Web Agents with Action Correction

[2602.15161] Exploiting Layer-Specific Vulnerabilities to Backdoor Attack in Federated Learning

[2602.15298] X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection

[2602.15274] When Remembering and Planning are Worth it: Navigating under Change

[2602.15173] Mind the (DH) Gap! A Contrast in Risky Choices Between Reasoning and Conversational LLMs

[2602.15212] Secure and Energy-Efficient Wireless Agentic AI Networks

[2602.15158] da Costa and Tarski meet Goguen and Carnap: a novel approach for ontological heterogeneity based on consequence systems

[2602.15143] Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

[2602.15829] Operationalising the Superficial Alignment Hypothesis via Task Complexity

[2602.15799] The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety

[2602.15763] GLM-5: from Vibe Coding to Agentic Engineering

[2602.15676] Relative Geometry of Neural Forecasters: Linking Accuracy and Alignment in Learned Latent Geometry

[2602.15637] The Stationarity Bias: Stratified Stress-Testing for Time-Series Imputation in Regulated Dynamical Systems

[2602.15602] Certified Per-Instance Unlearning Using Individual Sensitivity Bounds

[2602.15586] Uniform error bounds for quantized dynamical models

[2602.15571] Accelerated Predictive Coding Networks via Direct Kolen-Pollack Feedback Alignment

[2602.15515] The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes

[2602.15503] Approximation Theory for Lipschitz Continuous Transformers

[2602.15481] LLM-as-Judge on a Budget

Related Topics

Stay updated with AI News