AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[P] If you're building AI agents, logs aren't enough. You need evidence.

I have built a programmable governance layer for AI agents. I am considering to open source completely. Looking for feedback. Agent demos...

Reddit - Machine Learning · 1 min · about 8 hours ago

Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min · about 11 hours ago

Llms

[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

Abstract page for arXiv paper 2504.05995: NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

arXiv - AI · 4 min · about 11 hours ago

All Content

Ai Safety

[2602.14503] Bounding Probabilities of Causation with Partial Causal Diagrams

This paper presents a framework for bounding probabilities of causation using partial causal diagrams, addressing limitations of existing...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.14457] Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5

This technical report presents a comprehensive risk analysis framework for frontier AI, focusing on emerging threats and mitigation strat...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.14451] Precedent-Informed Reasoning: Mitigating Overthinking in Large Reasoning Models via Test-Time Precedent Learning

The paper introduces Precedent-Informed Reasoning (PIR) to enhance reasoning in Large Language Models (LLMs) by leveraging past cases, im...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.13910] Sufficient Conditions for Stability of Minimum-Norm Interpolating Deep ReLU Networks

This paper explores the stability of minimum-norm interpolating deep ReLU networks, identifying conditions under which these networks mai...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.13857] sleep2vec: Unified Cross-Modal Alignment for Heterogeneous Nocturnal Biosignals

The paper presents sleep2vec, a model for aligning diverse nocturnal biosignals to improve sleep staging and clinical assessments, addres...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.14370] Competition for attention predicts good-to-bad tipping in AI

This paper explores how competition for attention in AI systems can lead to tipping points from beneficial to harmful outcomes, providing...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.14307] Benchmarking at the Edge of Comprehension

This article discusses the challenges of benchmarking Large Language Models (LLMs) as they reach new performance levels, introducing a fr...

arXiv - Machine Learning · 4 min · about 2 months ago

Nlp

[2602.14252] GRAIL: Goal Recognition Alignment through Imitation Learning

The paper introduces GRAIL, a method for recognizing agent goals through imitation learning, enhancing goal recognition accuracy in AI sy...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.13791] MechPert: Mechanistic Consensus as an Inductive Bias for Unseen Perturbation Prediction

The paper introduces MechPert, a framework that enhances unseen genetic perturbation prediction by leveraging mechanistic consensus among...

arXiv - AI · 3 min · about 2 months ago

Ai Safety

[2602.14135] ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI

The paper presents the ForesightSafety Bench, a comprehensive framework for evaluating AI safety risks, addressing limitations in current...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14095] NEST: Nascent Encoded Steganographic Thoughts

The paper 'NEST: Nascent Encoded Steganographic Thoughts' explores the potential for large language models (LLMs) to conceal reasoning wi...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.14093] GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

The paper presents GUI-GENESIS, a framework for automating the synthesis of efficient training environments for GUI agents, enhancing per...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.14065] REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment

The paper presents the REAL framework, which addresses knowledge conflicts in Knowledge-Intensive Visual Question Answering (KI-VQA) by i...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13699] Attention Head Entropy of LLMs Predicts Answer Correctness

This paper introduces Head Entropy, a method for predicting answer correctness in large language models (LLMs) by analyzing attention ent...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.13985] Bridging AI and Clinical Reasoning: Abductive Explanations for Alignment on Critical Symptoms

This article discusses the integration of AI in clinical diagnostics, focusing on the use of abductive explanations to enhance AI's align...

arXiv - AI · 3 min · about 2 months ago

Computer Vision

[2602.13660] Optimized Certainty Equivalent Risk-Controlling Prediction Sets

This paper presents the Optimized Certainty Equivalent Risk-Controlling Prediction Sets (OCE-RCPS), a framework designed to enhance relia...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.13936] A Generalizable Physics-guided Causal Model for Trajectory Prediction in Autonomous Driving

This paper presents a Physics-guided Causal Model for trajectory prediction in autonomous driving, focusing on zero-shot generalization a...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.13651] Cumulative Utility Parity for Fair Federated Learning under Intermittent Client Participation

This paper introduces the concept of cumulative utility parity in federated learning, addressing fairness in client participation, partic...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13904] Diagnosing Pathological Chain-of-Thought in Reasoning Models

This paper discusses the identification and diagnosis of pathological chain-of-thought reasoning in AI models, highlighting three specifi...

arXiv - AI · 3 min · about 2 months ago

Ai Agents

[2602.13855] From Fluent to Verifiable: Claim-Level Auditability for Deep Research Agents

The paper discusses the need for claim-level auditability in deep research agents, highlighting the shift from factual errors to weak cla...

arXiv - AI · 3 min · about 2 months ago

Previous Page 113 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[P] If you're building AI agents, logs aren't enough. You need evidence.

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

All Content

[2602.14503] Bounding Probabilities of Causation with Partial Causal Diagrams

[2602.14457] Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5

[2602.14451] Precedent-Informed Reasoning: Mitigating Overthinking in Large Reasoning Models via Test-Time Precedent Learning

[2602.13910] Sufficient Conditions for Stability of Minimum-Norm Interpolating Deep ReLU Networks

[2602.13857] sleep2vec: Unified Cross-Modal Alignment for Heterogeneous Nocturnal Biosignals

[2602.14370] Competition for attention predicts good-to-bad tipping in AI

[2602.14307] Benchmarking at the Edge of Comprehension

[2602.14252] GRAIL: Goal Recognition Alignment through Imitation Learning

[2602.13791] MechPert: Mechanistic Consensus as an Inductive Bias for Unseen Perturbation Prediction

[2602.14135] ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI

[2602.14095] NEST: Nascent Encoded Steganographic Thoughts

[2602.14093] GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

[2602.14065] REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment

[2602.13699] Attention Head Entropy of LLMs Predicts Answer Correctness

[2602.13985] Bridging AI and Clinical Reasoning: Abductive Explanations for Alignment on Critical Symptoms

[2602.13660] Optimized Certainty Equivalent Risk-Controlling Prediction Sets

[2602.13936] A Generalizable Physics-guided Causal Model for Trajectory Prediction in Autonomous Driving

[2602.13651] Cumulative Utility Parity for Fair Federated Learning under Intermittent Client Participation

[2602.13904] Diagnosing Pathological Chain-of-Thought in Reasoning Models

[2602.13855] From Fluent to Verifiable: Claim-Level Auditability for Deep Research Agents

Related Topics

Stay updated with AI News