AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min · about 5 hours ago

Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min · about 11 hours ago

Machine Learning

We need to teach AI the essence of being human to reduce the risk of misalignment

One part of the alignment problem is that AI does not genuinely understand what it's like to live in the world, even though it can descri...

Reddit - Artificial Intelligence · 1 min · 1 day ago

All Content

Machine Learning

[2603.21904] SHAPE: Structure-aware Hierarchical Unsupervised Domain Adaptation with Plausibility Evaluation for Medical Image Segmentation

Abstract page for arXiv paper 2603.21904: SHAPE: Structure-aware Hierarchical Unsupervised Domain Adaptation with Plausibility Evaluation...

arXiv - AI · 4 min · 5 days ago

Llms

[2603.21872] Manifold-Aware Exploration for Reinforcement Learning in Video Generation

Abstract page for arXiv paper 2603.21872: Manifold-Aware Exploration for Reinforcement Learning in Video Generation

arXiv - AI · 4 min · 5 days ago

Machine Learning

[2603.21760] Cycle Inverse-Consistent TransMorph: A Balanced Deep Learning Framework for Brain MRI Registration

Abstract page for arXiv paper 2603.21760: Cycle Inverse-Consistent TransMorph: A Balanced Deep Learning Framework for Brain MRI Registration

arXiv - AI · 4 min · 5 days ago

Ai Safety

[2603.21735] Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction

Abstract page for arXiv paper 2603.21735: Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction

arXiv - AI · 4 min · 5 days ago

Llms

[2603.21697] Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Abstract page for arXiv paper 2603.21697: Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

arXiv - AI · 4 min · 5 days ago

Llms

[2603.21524] CatRAG: Functor-Guided Structural Debiasing with Retrieval Augmentation for Fair LLMs

Abstract page for arXiv paper 2603.21524: CatRAG: Functor-Guided Structural Debiasing with Retrieval Augmentation for Fair LLMs

arXiv - AI · 4 min · 5 days ago

Machine Learning

[2603.21502] Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networks

Abstract page for arXiv paper 2603.21502: Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networks

arXiv - Machine Learning · 4 min · 5 days ago

Robotics

[2603.21496] A Framework for Closed-Loop Robotic Assembly, Alignment and Self-Recovery of Precision Optical Systems

Abstract page for arXiv paper 2603.21496: A Framework for Closed-Loop Robotic Assembly, Alignment and Self-Recovery of Precision Optical ...

arXiv - AI · 3 min · 5 days ago

Machine Learning

[2603.21461] DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment

Abstract page for arXiv paper 2603.21461: DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment

arXiv - Machine Learning · 3 min · 5 days ago

Llms

[2603.21359] Benchmarking Bengali Dialectal Bias: A Multi-Stage Framework Integrating RAG-Based Translation and Human-Augmented RLAIF

Abstract page for arXiv paper 2603.21359: Benchmarking Bengali Dialectal Bias: A Multi-Stage Framework Integrating RAG-Based Translation ...

arXiv - AI · 4 min · 5 days ago

Machine Learning

[2603.21213] Positional Segmentor-Guided Counterfactual Fine-Tuning for Spatially Localized Image Synthesis

Abstract page for arXiv paper 2603.21213: Positional Segmentor-Guided Counterfactual Fine-Tuning for Spatially Localized Image Synthesis

arXiv - AI · 3 min · 5 days ago

Llms

[2603.21276] Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

Abstract page for arXiv paper 2603.21276: Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2603.21175] Reward Sharpness-Aware Fine-Tuning for Diffusion Models

Abstract page for arXiv paper 2603.21175: Reward Sharpness-Aware Fine-Tuning for Diffusion Models

arXiv - Machine Learning · 3 min · 5 days ago

Llms

[2603.21149] Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains

Abstract page for arXiv paper 2603.21149: Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based...

arXiv - AI · 3 min · 5 days ago

Robotics

[2603.21046] SpatialFly: Geometry-Guided Representation Alignment for UAV Vision-and-Language Navigation in Urban Environments

Abstract page for arXiv paper 2603.21046: SpatialFly: Geometry-Guided Representation Alignment for UAV Vision-and-Language Navigation in ...

arXiv - AI · 4 min · 5 days ago

Llms

[2603.21016] Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO

Abstract page for arXiv paper 2603.21016: Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO

arXiv - Machine Learning · 3 min · 5 days ago

Llms

[2603.21006] How AI Systems Think About Education: Analyzing Latent Preference Patterns in Large Language Models

Abstract page for arXiv paper 2603.21006: How AI Systems Think About Education: Analyzing Latent Preference Patterns in Large Language Mo...

arXiv - AI · 3 min · 5 days ago

Llms

[2603.20957] Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

Abstract page for arXiv paper 2603.20957: Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Lan...

arXiv - AI · 4 min · 5 days ago

Machine Learning

[2603.20953] Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents

Abstract page for arXiv paper 2603.20953: Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents

arXiv - AI · 4 min · 5 days ago

Llms

[2603.20939] User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction

Abstract page for arXiv paper 2603.20939: User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented I...

arXiv - AI · 4 min · 5 days ago

Previous Page 9 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Bias in AI: Examples and 6 Ways to Fix it in 2026

[R] I built a benchmark that catches LLMs breaking physics laws

We need to teach AI the essence of being human to reduce the risk of misalignment

All Content

[2603.21904] SHAPE: Structure-aware Hierarchical Unsupervised Domain Adaptation with Plausibility Evaluation for Medical Image Segmentation

[2603.21872] Manifold-Aware Exploration for Reinforcement Learning in Video Generation

[2603.21760] Cycle Inverse-Consistent TransMorph: A Balanced Deep Learning Framework for Brain MRI Registration

[2603.21735] Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction

[2603.21697] Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

[2603.21524] CatRAG: Functor-Guided Structural Debiasing with Retrieval Augmentation for Fair LLMs

[2603.21502] Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networks

[2603.21496] A Framework for Closed-Loop Robotic Assembly, Alignment and Self-Recovery of Precision Optical Systems

[2603.21461] DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment

[2603.21359] Benchmarking Bengali Dialectal Bias: A Multi-Stage Framework Integrating RAG-Based Translation and Human-Augmented RLAIF

[2603.21213] Positional Segmentor-Guided Counterfactual Fine-Tuning for Spatially Localized Image Synthesis

[2603.21276] Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

[2603.21175] Reward Sharpness-Aware Fine-Tuning for Diffusion Models

[2603.21149] Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains

[2603.21046] SpatialFly: Geometry-Guided Representation Alignment for UAV Vision-and-Language Navigation in Urban Environments

[2603.21016] Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO

[2603.21006] How AI Systems Think About Education: Analyzing Latent Preference Patterns in Large Language Models

[2603.20957] Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

[2603.20953] Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents

[2603.20939] User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction

Related Topics

Stay updated with AI News