AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min ·
Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min ·
Machine Learning

We need to teach AI the essence of being human to reduce the risk of misalignment

One part of the alignment problem is that AI does not genuinely understand what it's like to live in the world, even though it can descri...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2603.20899] Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach
Llms

[2603.20899] Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach

Abstract page for arXiv paper 2603.20899: Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach

arXiv - AI · 3 min ·
[2603.20656] Sinkhorn Based Associative Memory Retrieval Using Spherical Hellinger Kantorovich Dynamics
Nlp

[2603.20656] Sinkhorn Based Associative Memory Retrieval Using Spherical Hellinger Kantorovich Dynamics

Abstract page for arXiv paper 2603.20656: Sinkhorn Based Associative Memory Retrieval Using Spherical Hellinger Kantorovich Dynamics

arXiv - Machine Learning · 3 min ·
[2603.20634] CFNN: Continued Fraction Neural Network
Machine Learning

[2603.20634] CFNN: Continued Fraction Neural Network

Abstract page for arXiv paper 2603.20634: CFNN: Continued Fraction Neural Network

arXiv - Machine Learning · 3 min ·
[2603.20320] The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents
Llms

[2603.20320] The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents

Abstract page for arXiv paper 2603.20320: The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents

arXiv - Machine Learning · 4 min ·
[2603.20303] InjectFlow: Weak Guides Strong via Orthogonal Injection for Flow Matching
Machine Learning

[2603.20303] InjectFlow: Weak Guides Strong via Orthogonal Injection for Flow Matching

Abstract page for arXiv paper 2603.20303: InjectFlow: Weak Guides Strong via Orthogonal Injection for Flow Matching

arXiv - AI · 4 min ·
[2603.20300] From Human Interfaces to Agent Interfaces: Rethinking Software Design in the Age of AI-Native Systems
Llms

[2603.20300] From Human Interfaces to Agent Interfaces: Rethinking Software Design in the Age of AI-Native Systems

Abstract page for arXiv paper 2603.20300: From Human Interfaces to Agent Interfaces: Rethinking Software Design in the Age of AI-Native S...

arXiv - AI · 3 min ·
[2603.20248] Stability of AI Governance Systems: A Coupled Dynamics Model of Public Trust and Social Disruptions
Machine Learning

[2603.20248] Stability of AI Governance Systems: A Coupled Dynamics Model of Public Trust and Social Disruptions

Abstract page for arXiv paper 2603.20248: Stability of AI Governance Systems: A Coupled Dynamics Model of Public Trust and Social Disrupt...

arXiv - AI · 4 min ·
[2603.20229] Characterizing the ability of LLMs to recapitulate Americans' distributional responses to public opinion polling questions across political issues
Llms

[2603.20229] Characterizing the ability of LLMs to recapitulate Americans' distributional responses to public opinion polling questions across political issues

Abstract page for arXiv paper 2603.20229: Characterizing the ability of LLMs to recapitulate Americans' distributional responses to publi...

arXiv - AI · 4 min ·
[2603.21854] Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models
Llms

[2603.21854] Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models

Abstract page for arXiv paper 2603.21854: Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language ...

arXiv - AI · 4 min ·
[2603.21687] Mirage The Illusion of Visual Understanding
Machine Learning

[2603.21687] Mirage The Illusion of Visual Understanding

Abstract page for arXiv paper 2603.21687: Mirage The Illusion of Visual Understanding

arXiv - AI · 4 min ·
[2603.21574] Adaptive Robust Estimator for Multi-Agent Reinforcement Learning
Llms

[2603.21574] Adaptive Robust Estimator for Multi-Agent Reinforcement Learning

Abstract page for arXiv paper 2603.21574: Adaptive Robust Estimator for Multi-Agent Reinforcement Learning

arXiv - AI · 3 min ·
[2603.21558] Stabilizing Iterative Self-Training with Verified Reasoning via Symbolic Recursive Self-Alignment
Machine Learning

[2603.21558] Stabilizing Iterative Self-Training with Verified Reasoning via Symbolic Recursive Self-Alignment

Abstract page for arXiv paper 2603.21558: Stabilizing Iterative Self-Training with Verified Reasoning via Symbolic Recursive Self-Alignment

arXiv - AI · 4 min ·
[2603.21435] Behavioural feasible set: Value alignment constraints on AI decision support
Ai Safety

[2603.21435] Behavioural feasible set: Value alignment constraints on AI decision support

Abstract page for arXiv paper 2603.21435: Behavioural feasible set: Value alignment constraints on AI decision support

arXiv - AI · 3 min ·
[2603.21362] AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation
Llms

[2603.21362] AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation

Abstract page for arXiv paper 2603.21362: AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation

arXiv - AI · 3 min ·
[2603.21341] RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models
Llms

[2603.21341] RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

Abstract page for arXiv paper 2603.21341: RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action...

arXiv - AI · 4 min ·
[2603.21340] ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture
Machine Learning

[2603.21340] ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture

Abstract page for arXiv paper 2603.21340: ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture

arXiv - AI · 4 min ·
[2603.21321] Improving Coherence and Persistence in Agentic AI for System Optimization
Llms

[2603.21321] Improving Coherence and Persistence in Agentic AI for System Optimization

Abstract page for arXiv paper 2603.21321: Improving Coherence and Persistence in Agentic AI for System Optimization

arXiv - AI · 4 min ·
[2603.20925] Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions
Ai Safety

[2603.20925] Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions

Abstract page for arXiv paper 2603.20925: Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions

arXiv - AI · 4 min ·
[2603.20815] GMPilot: An Expert AI Agent For FDA cGMP Compliance
Nlp

[2603.20815] GMPilot: An Expert AI Agent For FDA cGMP Compliance

Abstract page for arXiv paper 2603.20815: GMPilot: An Expert AI Agent For FDA cGMP Compliance

arXiv - AI · 3 min ·
[2603.20833] Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems
Ai Safety

[2603.20833] Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems

Abstract page for arXiv paper 2603.20833: Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems

arXiv - AI · 3 min ·
Previous Page 10 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime