AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization
Machine Learning

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

Abstract page for arXiv paper 2603.14267: DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and ...

arXiv - AI · 4 min ·
[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations
Llms

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

Abstract page for arXiv paper 2601.22440: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Value...

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·

All Content

[2603.03536] SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems
Llms

[2603.03536] SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

Abstract page for arXiv paper 2603.03536: SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

arXiv - AI · 3 min ·
[2603.03920] BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning
Machine Learning

[2603.03920] BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning

Abstract page for arXiv paper 2603.03920: BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning

arXiv - AI · 4 min ·
[2603.03515] The Controllability Trap: A Governance Framework for Military AI Agents
Machine Learning

[2603.03515] The Controllability Trap: A Governance Framework for Military AI Agents

Abstract page for arXiv paper 2603.03515: The Controllability Trap: A Governance Framework for Military AI Agents

arXiv - AI · 3 min ·
[2603.03867] k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods
Ai Safety

[2603.03867] k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods

Abstract page for arXiv paper 2603.03867: k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods

arXiv - Machine Learning · 4 min ·
[2603.03820] Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation
Ai Safety

[2603.03820] Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation

Abstract page for arXiv paper 2603.03820: Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learnin...

arXiv - AI · 4 min ·
[2603.03662] Graph Negative Feedback Bias Correction Framework for Adaptive Heterophily Modeling
Machine Learning

[2603.03662] Graph Negative Feedback Bias Correction Framework for Adaptive Heterophily Modeling

Abstract page for arXiv paper 2603.03662: Graph Negative Feedback Bias Correction Framework for Adaptive Heterophily Modeling

arXiv - AI · 4 min ·
[2603.03341] Ethical and Explainable AI in Reusable MLOps Pipelines
Machine Learning

[2603.03341] Ethical and Explainable AI in Reusable MLOps Pipelines

Abstract page for arXiv paper 2603.03341: Ethical and Explainable AI in Reusable MLOps Pipelines

arXiv - AI · 4 min ·
[2603.03507] Solving adversarial examples requires solving exponential misalignment
Machine Learning

[2603.03507] Solving adversarial examples requires solving exponential misalignment

Abstract page for arXiv paper 2603.03507: Solving adversarial examples requires solving exponential misalignment

arXiv - Machine Learning · 4 min ·
[2603.03326] Controllable and explainable personality sliders for LLMs at inference time
Llms

[2603.03326] Controllable and explainable personality sliders for LLMs at inference time

Abstract page for arXiv paper 2603.03326: Controllable and explainable personality sliders for LLMs at inference time

arXiv - AI · 3 min ·
[2603.03469] Biased Generalization in Diffusion Models
Machine Learning

[2603.03469] Biased Generalization in Diffusion Models

Abstract page for arXiv paper 2603.03469: Biased Generalization in Diffusion Models

arXiv - Machine Learning · 4 min ·
[2603.03324] Controlling Chat Style in Language Models via Single-Direction Editing
Llms

[2603.03324] Controlling Chat Style in Language Models via Single-Direction Editing

Abstract page for arXiv paper 2603.03324: Controlling Chat Style in Language Models via Single-Direction Editing

arXiv - AI · 3 min ·
[2603.03319] Automated Concept Discovery for LLM-as-a-Judge Preference Analysis
Llms

[2603.03319] Automated Concept Discovery for LLM-as-a-Judge Preference Analysis

Abstract page for arXiv paper 2603.03319: Automated Concept Discovery for LLM-as-a-Judge Preference Analysis

arXiv - AI · 4 min ·
[2603.03312] Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding
Machine Learning

[2603.03312] Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding

Abstract page for arXiv paper 2603.03312: Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to...

arXiv - AI · 4 min ·
[2603.03308] Old Habits Die Hard: How Conversational History Geometrically Traps LLMs
Llms

[2603.03308] Old Habits Die Hard: How Conversational History Geometrically Traps LLMs

Abstract page for arXiv paper 2603.03308: Old Habits Die Hard: How Conversational History Geometrically Traps LLMs

arXiv - AI · 3 min ·
[2603.03303] HumanLM: Simulating Users with State Alignment Beats Response Imitation
Llms

[2603.03303] HumanLM: Simulating Users with State Alignment Beats Response Imitation

Abstract page for arXiv paper 2603.03303: HumanLM: Simulating Users with State Alignment Beats Response Imitation

arXiv - AI · 4 min ·
[2603.03298] TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation
Llms

[2603.03298] TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation

Abstract page for arXiv paper 2603.03298: TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation

arXiv - AI · 4 min ·
[2603.03291] One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models
Llms

[2603.03291] One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models

Abstract page for arXiv paper 2603.03291: One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models

arXiv - AI · 3 min ·
[2603.04390] A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development
Llms

[2603.04390] A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

Abstract page for arXiv paper 2603.04390: A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

arXiv - AI · 3 min ·
[2603.03686] AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment
Llms

[2603.03686] AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment

Abstract page for arXiv paper 2603.03686: AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Ali...

arXiv - AI · 4 min ·
[2603.03655] Mozi: Governed Autonomy for Drug Discovery LLM Agents
Llms

[2603.03655] Mozi: Governed Autonomy for Drug Discovery LLM Agents

Abstract page for arXiv paper 2603.03655: Mozi: Governed Autonomy for Drug Discovery LLM Agents

arXiv - AI · 4 min ·
Previous Page 19 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime