AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min · about 6 hours ago

Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min · about 13 hours ago

Machine Learning

We need to teach AI the essence of being human to reduce the risk of misalignment

One part of the alignment problem is that AI does not genuinely understand what it's like to live in the world, even though it can descri...

Reddit - Artificial Intelligence · 1 min · 1 day ago

All Content

Llms

[2603.20899] Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach

Abstract page for arXiv paper 2603.20899: Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach

arXiv - AI · 3 min · 5 days ago

Nlp

[2603.20656] Sinkhorn Based Associative Memory Retrieval Using Spherical Hellinger Kantorovich Dynamics

Abstract page for arXiv paper 2603.20656: Sinkhorn Based Associative Memory Retrieval Using Spherical Hellinger Kantorovich Dynamics

arXiv - Machine Learning · 3 min · 5 days ago

Machine Learning

[2603.20634] CFNN: Continued Fraction Neural Network

Abstract page for arXiv paper 2603.20634: CFNN: Continued Fraction Neural Network

arXiv - Machine Learning · 3 min · 5 days ago

Llms

[2603.20320] The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents

Abstract page for arXiv paper 2603.20320: The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents

arXiv - Machine Learning · 4 min · 5 days ago

Machine Learning

[2603.20303] InjectFlow: Weak Guides Strong via Orthogonal Injection for Flow Matching

Abstract page for arXiv paper 2603.20303: InjectFlow: Weak Guides Strong via Orthogonal Injection for Flow Matching

arXiv - AI · 4 min · 5 days ago

Llms

[2603.20300] From Human Interfaces to Agent Interfaces: Rethinking Software Design in the Age of AI-Native Systems

Abstract page for arXiv paper 2603.20300: From Human Interfaces to Agent Interfaces: Rethinking Software Design in the Age of AI-Native S...

arXiv - AI · 3 min · 5 days ago

Machine Learning

[2603.20248] Stability of AI Governance Systems: A Coupled Dynamics Model of Public Trust and Social Disruptions

Abstract page for arXiv paper 2603.20248: Stability of AI Governance Systems: A Coupled Dynamics Model of Public Trust and Social Disrupt...

arXiv - AI · 4 min · 5 days ago

Llms

[2603.20229] Characterizing the ability of LLMs to recapitulate Americans' distributional responses to public opinion polling questions across political issues

Abstract page for arXiv paper 2603.20229: Characterizing the ability of LLMs to recapitulate Americans' distributional responses to publi...

arXiv - AI · 4 min · 5 days ago

Llms

[2603.21854] Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models

Abstract page for arXiv paper 2603.21854: Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language ...

arXiv - AI · 4 min · 5 days ago

Machine Learning

[2603.21687] Mirage The Illusion of Visual Understanding

Abstract page for arXiv paper 2603.21687: Mirage The Illusion of Visual Understanding

arXiv - AI · 4 min · 5 days ago

Llms

[2603.21574] Adaptive Robust Estimator for Multi-Agent Reinforcement Learning

Abstract page for arXiv paper 2603.21574: Adaptive Robust Estimator for Multi-Agent Reinforcement Learning

arXiv - AI · 3 min · 5 days ago

Machine Learning

[2603.21558] Stabilizing Iterative Self-Training with Verified Reasoning via Symbolic Recursive Self-Alignment

Abstract page for arXiv paper 2603.21558: Stabilizing Iterative Self-Training with Verified Reasoning via Symbolic Recursive Self-Alignment

arXiv - AI · 4 min · 5 days ago

Ai Safety

[2603.21435] Behavioural feasible set: Value alignment constraints on AI decision support

Abstract page for arXiv paper 2603.21435: Behavioural feasible set: Value alignment constraints on AI decision support

arXiv - AI · 3 min · 5 days ago

Llms

[2603.21362] AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation

Abstract page for arXiv paper 2603.21362: AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation

arXiv - AI · 3 min · 5 days ago

Llms

[2603.21341] RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

Abstract page for arXiv paper 2603.21341: RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action...

arXiv - AI · 4 min · 5 days ago

Machine Learning

[2603.21340] ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture

Abstract page for arXiv paper 2603.21340: ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture

arXiv - AI · 4 min · 5 days ago

Llms

[2603.21321] Improving Coherence and Persistence in Agentic AI for System Optimization

Abstract page for arXiv paper 2603.21321: Improving Coherence and Persistence in Agentic AI for System Optimization

arXiv - AI · 4 min · 5 days ago

Ai Safety

[2603.20925] Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions

Abstract page for arXiv paper 2603.20925: Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions

arXiv - AI · 4 min · 5 days ago

Nlp

[2603.20815] GMPilot: An Expert AI Agent For FDA cGMP Compliance

Abstract page for arXiv paper 2603.20815: GMPilot: An Expert AI Agent For FDA cGMP Compliance

arXiv - AI · 3 min · 5 days ago

Ai Safety

[2603.20833] Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems

Abstract page for arXiv paper 2603.20833: Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems

arXiv - AI · 3 min · 5 days ago

Previous Page 10 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Bias in AI: Examples and 6 Ways to Fix it in 2026

[R] I built a benchmark that catches LLMs breaking physics laws

We need to teach AI the essence of being human to reduce the risk of misalignment

All Content

[2603.20899] Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach

[2603.20656] Sinkhorn Based Associative Memory Retrieval Using Spherical Hellinger Kantorovich Dynamics

[2603.20634] CFNN: Continued Fraction Neural Network

[2603.20320] The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents

[2603.20303] InjectFlow: Weak Guides Strong via Orthogonal Injection for Flow Matching

[2603.20300] From Human Interfaces to Agent Interfaces: Rethinking Software Design in the Age of AI-Native Systems

[2603.20248] Stability of AI Governance Systems: A Coupled Dynamics Model of Public Trust and Social Disruptions

[2603.20229] Characterizing the ability of LLMs to recapitulate Americans' distributional responses to public opinion polling questions across political issues

[2603.21854] Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models

[2603.21687] Mirage The Illusion of Visual Understanding

[2603.21574] Adaptive Robust Estimator for Multi-Agent Reinforcement Learning

[2603.21558] Stabilizing Iterative Self-Training with Verified Reasoning via Symbolic Recursive Self-Alignment

[2603.21435] Behavioural feasible set: Value alignment constraints on AI decision support

[2603.21362] AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation

[2603.21341] RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

[2603.21340] ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture

[2603.21321] Improving Coherence and Persistence in Agentic AI for System Optimization

[2603.20925] Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions

[2603.20815] GMPilot: An Expert AI Agent For FDA cGMP Compliance

[2603.20833] Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems

Related Topics

Stay updated with AI News