AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min ·
Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min ·
Machine Learning

We need to teach AI the essence of being human to reduce the risk of misalignment

One part of the alignment problem is that AI does not genuinely understand what it's like to live in the world, even though it can descri...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2603.21904] SHAPE: Structure-aware Hierarchical Unsupervised Domain Adaptation with Plausibility Evaluation for Medical Image Segmentation
Machine Learning

[2603.21904] SHAPE: Structure-aware Hierarchical Unsupervised Domain Adaptation with Plausibility Evaluation for Medical Image Segmentation

Abstract page for arXiv paper 2603.21904: SHAPE: Structure-aware Hierarchical Unsupervised Domain Adaptation with Plausibility Evaluation...

arXiv - AI · 4 min ·
[2603.21872] Manifold-Aware Exploration for Reinforcement Learning in Video Generation
Llms

[2603.21872] Manifold-Aware Exploration for Reinforcement Learning in Video Generation

Abstract page for arXiv paper 2603.21872: Manifold-Aware Exploration for Reinforcement Learning in Video Generation

arXiv - AI · 4 min ·
[2603.21760] Cycle Inverse-Consistent TransMorph: A Balanced Deep Learning Framework for Brain MRI Registration
Machine Learning

[2603.21760] Cycle Inverse-Consistent TransMorph: A Balanced Deep Learning Framework for Brain MRI Registration

Abstract page for arXiv paper 2603.21760: Cycle Inverse-Consistent TransMorph: A Balanced Deep Learning Framework for Brain MRI Registration

arXiv - AI · 4 min ·
[2603.21735] Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction
Ai Safety

[2603.21735] Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction

Abstract page for arXiv paper 2603.21735: Cognitive Agency Surrender: Defending Epistemic Sovereignty via Scaffolded AI Friction

arXiv - AI · 4 min ·
[2603.21697] Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models
Llms

[2603.21697] Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Abstract page for arXiv paper 2603.21697: Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

arXiv - AI · 4 min ·
[2603.21524] CatRAG: Functor-Guided Structural Debiasing with Retrieval Augmentation for Fair LLMs
Llms

[2603.21524] CatRAG: Functor-Guided Structural Debiasing with Retrieval Augmentation for Fair LLMs

Abstract page for arXiv paper 2603.21524: CatRAG: Functor-Guided Structural Debiasing with Retrieval Augmentation for Fair LLMs

arXiv - AI · 4 min ·
[2603.21502] Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networks
Machine Learning

[2603.21502] Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networks

Abstract page for arXiv paper 2603.21502: Quotient Geometry, Effective Curvature, and Implicit Bias in Simple Shallow Neural Networks

arXiv - Machine Learning · 4 min ·
[2603.21496] A Framework for Closed-Loop Robotic Assembly, Alignment and Self-Recovery of Precision Optical Systems
Robotics

[2603.21496] A Framework for Closed-Loop Robotic Assembly, Alignment and Self-Recovery of Precision Optical Systems

Abstract page for arXiv paper 2603.21496: A Framework for Closed-Loop Robotic Assembly, Alignment and Self-Recovery of Precision Optical ...

arXiv - AI · 3 min ·
[2603.21461] DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment
Machine Learning

[2603.21461] DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment

Abstract page for arXiv paper 2603.21461: DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment

arXiv - Machine Learning · 3 min ·
[2603.21359] Benchmarking Bengali Dialectal Bias: A Multi-Stage Framework Integrating RAG-Based Translation and Human-Augmented RLAIF
Llms

[2603.21359] Benchmarking Bengali Dialectal Bias: A Multi-Stage Framework Integrating RAG-Based Translation and Human-Augmented RLAIF

Abstract page for arXiv paper 2603.21359: Benchmarking Bengali Dialectal Bias: A Multi-Stage Framework Integrating RAG-Based Translation ...

arXiv - AI · 4 min ·
[2603.21213] Positional Segmentor-Guided Counterfactual Fine-Tuning for Spatially Localized Image Synthesis
Machine Learning

[2603.21213] Positional Segmentor-Guided Counterfactual Fine-Tuning for Spatially Localized Image Synthesis

Abstract page for arXiv paper 2603.21213: Positional Segmentor-Guided Counterfactual Fine-Tuning for Spatially Localized Image Synthesis

arXiv - AI · 3 min ·
[2603.21276] Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity
Llms

[2603.21276] Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

Abstract page for arXiv paper 2603.21276: Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

arXiv - Machine Learning · 4 min ·
[2603.21175] Reward Sharpness-Aware Fine-Tuning for Diffusion Models
Llms

[2603.21175] Reward Sharpness-Aware Fine-Tuning for Diffusion Models

Abstract page for arXiv paper 2603.21175: Reward Sharpness-Aware Fine-Tuning for Diffusion Models

arXiv - Machine Learning · 3 min ·
[2603.21149] Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains
Llms

[2603.21149] Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains

Abstract page for arXiv paper 2603.21149: Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based...

arXiv - AI · 3 min ·
[2603.21046] SpatialFly: Geometry-Guided Representation Alignment for UAV Vision-and-Language Navigation in Urban Environments
Robotics

[2603.21046] SpatialFly: Geometry-Guided Representation Alignment for UAV Vision-and-Language Navigation in Urban Environments

Abstract page for arXiv paper 2603.21046: SpatialFly: Geometry-Guided Representation Alignment for UAV Vision-and-Language Navigation in ...

arXiv - AI · 4 min ·
[2603.21016] Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO
Llms

[2603.21016] Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO

Abstract page for arXiv paper 2603.21016: Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO

arXiv - Machine Learning · 3 min ·
[2603.21006] How AI Systems Think About Education: Analyzing Latent Preference Patterns in Large Language Models
Llms

[2603.21006] How AI Systems Think About Education: Analyzing Latent Preference Patterns in Large Language Models

Abstract page for arXiv paper 2603.21006: How AI Systems Think About Education: Analyzing Latent Preference Patterns in Large Language Mo...

arXiv - AI · 3 min ·
[2603.20957] Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models
Llms

[2603.20957] Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

Abstract page for arXiv paper 2603.20957: Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Lan...

arXiv - AI · 4 min ·
[2603.20953] Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents
Machine Learning

[2603.20953] Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents

Abstract page for arXiv paper 2603.20953: Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents

arXiv - AI · 4 min ·
[2603.20939] User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction
Llms

[2603.20939] User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction

Abstract page for arXiv paper 2603.20939: User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented I...

arXiv - AI · 4 min ·
Previous Page 9 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime