AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

Abstract page for arXiv paper 2603.14267: DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and ...

arXiv - AI · 4 min · 38 minutes ago

Llms

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

Abstract page for arXiv paper 2601.22440: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Value...

arXiv - AI · 4 min · 38 minutes ago

Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min · 38 minutes ago

All Content

Llms

[2603.03536] SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

Abstract page for arXiv paper 2603.03536: SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

arXiv - AI · 3 min · 25 days ago

Machine Learning

[2603.03920] BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning

Abstract page for arXiv paper 2603.03920: BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2603.03515] The Controllability Trap: A Governance Framework for Military AI Agents

Abstract page for arXiv paper 2603.03515: The Controllability Trap: A Governance Framework for Military AI Agents

arXiv - AI · 3 min · 25 days ago

Ai Safety

[2603.03867] k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods

Abstract page for arXiv paper 2603.03867: k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods

arXiv - Machine Learning · 4 min · 25 days ago

Ai Safety

[2603.03820] Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation

Abstract page for arXiv paper 2603.03820: Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learnin...

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2603.03662] Graph Negative Feedback Bias Correction Framework for Adaptive Heterophily Modeling

Abstract page for arXiv paper 2603.03662: Graph Negative Feedback Bias Correction Framework for Adaptive Heterophily Modeling

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2603.03341] Ethical and Explainable AI in Reusable MLOps Pipelines

Abstract page for arXiv paper 2603.03341: Ethical and Explainable AI in Reusable MLOps Pipelines

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2603.03507] Solving adversarial examples requires solving exponential misalignment

Abstract page for arXiv paper 2603.03507: Solving adversarial examples requires solving exponential misalignment

arXiv - Machine Learning · 4 min · 25 days ago

Llms

[2603.03326] Controllable and explainable personality sliders for LLMs at inference time

Abstract page for arXiv paper 2603.03326: Controllable and explainable personality sliders for LLMs at inference time

arXiv - AI · 3 min · 25 days ago

Machine Learning

[2603.03469] Biased Generalization in Diffusion Models

Abstract page for arXiv paper 2603.03469: Biased Generalization in Diffusion Models

arXiv - Machine Learning · 4 min · 25 days ago

Llms

[2603.03324] Controlling Chat Style in Language Models via Single-Direction Editing

Abstract page for arXiv paper 2603.03324: Controlling Chat Style in Language Models via Single-Direction Editing

arXiv - AI · 3 min · 25 days ago

Llms

[2603.03319] Automated Concept Discovery for LLM-as-a-Judge Preference Analysis

Abstract page for arXiv paper 2603.03319: Automated Concept Discovery for LLM-as-a-Judge Preference Analysis

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2603.03312] Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

Abstract page for arXiv paper 2603.03312: Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to...

arXiv - AI · 4 min · 25 days ago

Llms

[2603.03308] Old Habits Die Hard: How Conversational History Geometrically Traps LLMs

Abstract page for arXiv paper 2603.03308: Old Habits Die Hard: How Conversational History Geometrically Traps LLMs

arXiv - AI · 3 min · 25 days ago

Llms

[2603.03303] HumanLM: Simulating Users with State Alignment Beats Response Imitation

Abstract page for arXiv paper 2603.03303: HumanLM: Simulating Users with State Alignment Beats Response Imitation

arXiv - AI · 4 min · 25 days ago

Llms

[2603.03298] TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation

Abstract page for arXiv paper 2603.03298: TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation

arXiv - AI · 4 min · 25 days ago

Llms

[2603.03291] One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models

Abstract page for arXiv paper 2603.03291: One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models

arXiv - AI · 3 min · 25 days ago

Llms

[2603.04390] A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

Abstract page for arXiv paper 2603.04390: A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

arXiv - AI · 3 min · 25 days ago

Llms

[2603.03686] AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment

Abstract page for arXiv paper 2603.03686: AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Ali...

arXiv - AI · 4 min · 25 days ago

Llms

[2603.03655] Mozi: Governed Autonomy for Drug Discovery LLM Agents

Abstract page for arXiv paper 2603.03655: Mozi: Governed Autonomy for Drug Discovery LLM Agents

arXiv - AI · 4 min · 25 days ago

Previous Page 19 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

All Content

[2603.03536] SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

[2603.03920] BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning

[2603.03515] The Controllability Trap: A Governance Framework for Military AI Agents

[2603.03867] k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods

[2603.03820] Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation

[2603.03662] Graph Negative Feedback Bias Correction Framework for Adaptive Heterophily Modeling

[2603.03341] Ethical and Explainable AI in Reusable MLOps Pipelines

[2603.03507] Solving adversarial examples requires solving exponential misalignment

[2603.03326] Controllable and explainable personality sliders for LLMs at inference time

[2603.03469] Biased Generalization in Diffusion Models

[2603.03324] Controlling Chat Style in Language Models via Single-Direction Editing

[2603.03319] Automated Concept Discovery for LLM-as-a-Judge Preference Analysis

[2603.03312] Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

[2603.03308] Old Habits Die Hard: How Conversational History Geometrically Traps LLMs

[2603.03303] HumanLM: Simulating Users with State Alignment Beats Response Imitation

[2603.03298] TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation

[2603.03291] One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models

[2603.04390] A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

[2603.03686] AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment

[2603.03655] Mozi: Governed Autonomy for Drug Discovery LLM Agents

Related Topics

Stay updated with AI News