AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min ·
Ai Safety

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min ·

All Content

[2603.05375] Robust Node Affinities via Jaccard-Biased Random Walks and Rank Aggregation
Machine Learning

[2603.05375] Robust Node Affinities via Jaccard-Biased Random Walks and Rank Aggregation

Abstract page for arXiv paper 2603.05375: Robust Node Affinities via Jaccard-Biased Random Walks and Rank Aggregation

arXiv - Machine Learning · 4 min ·
[2603.05327] FairFinGAN: Fairness-aware Synthetic Financial Data Generation
Machine Learning

[2603.05327] FairFinGAN: Fairness-aware Synthetic Financial Data Generation

Abstract page for arXiv paper 2603.05327: FairFinGAN: Fairness-aware Synthetic Financial Data Generation

arXiv - Machine Learning · 3 min ·
[2603.05293] Knowledge Divergence and the Value of Debate for Scalable Oversight
Machine Learning

[2603.05293] Knowledge Divergence and the Value of Debate for Scalable Oversight

Abstract page for arXiv paper 2603.05293: Knowledge Divergence and the Value of Debate for Scalable Oversight

arXiv - Machine Learning · 4 min ·
[2603.05175] Incentive Aware AI Regulations: A Credal Characterisation
Machine Learning

[2603.05175] Incentive Aware AI Regulations: A Credal Characterisation

Abstract page for arXiv paper 2603.05175: Incentive Aware AI Regulations: A Credal Characterisation

arXiv - Machine Learning · 4 min ·
[2603.04851] Why Is RLHF Alignment Shallow? A Gradient Analysis
Llms

[2603.04851] Why Is RLHF Alignment Shallow? A Gradient Analysis

Abstract page for arXiv paper 2603.04851: Why Is RLHF Alignment Shallow? A Gradient Analysis

arXiv - Machine Learning · 3 min ·
[2603.04831] Missingness Bias Calibration in Feature Attribution Explanations
Machine Learning

[2603.04831] Missingness Bias Calibration in Feature Attribution Explanations

Abstract page for arXiv paper 2603.04831: Missingness Bias Calibration in Feature Attribution Explanations

arXiv - Machine Learning · 3 min ·
[2603.04703] Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness
Machine Learning

[2603.04703] Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness

Abstract page for arXiv paper 2603.04703: Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness

arXiv - Machine Learning · 4 min ·
[2603.04595] A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments
Ai Safety

[2603.04595] A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments

Abstract page for arXiv paper 2603.04595: A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthca...

arXiv - Machine Learning · 4 min ·
[2602.09980] Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Informed Neural Networks
Machine Learning

[2602.09980] Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Informed Neural Networks

Abstract page for arXiv paper 2602.09980: Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Info...

arXiv - Machine Learning · 4 min ·
[2510.00177] PrefDisco: Benchmarking Proactive Personalized Reasoning
Llms

[2510.00177] PrefDisco: Benchmarking Proactive Personalized Reasoning

Abstract page for arXiv paper 2510.00177: PrefDisco: Benchmarking Proactive Personalized Reasoning

arXiv - AI · 4 min ·
[2509.23886] Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Llms

[2509.23886] Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer

Abstract page for arXiv paper 2509.23886: Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer

arXiv - Machine Learning · 4 min ·
[2508.06249] In-Training Defenses against Emergent Misalignment in Language Models
Llms

[2508.06249] In-Training Defenses against Emergent Misalignment in Language Models

Abstract page for arXiv paper 2508.06249: In-Training Defenses against Emergent Misalignment in Language Models

arXiv - Machine Learning · 4 min ·
[2505.05589] ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
Robotics

[2505.05589] ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation

Abstract page for arXiv paper 2505.05589: ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance...

arXiv - Machine Learning · 4 min ·
[2602.00485] Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models
Llms

[2602.00485] Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

Abstract page for arXiv paper 2602.00485: Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

arXiv - AI · 4 min ·
[2511.21033] Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning
Llms

[2511.21033] Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning

Abstract page for arXiv paper 2511.21033: Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning

arXiv - AI · 4 min ·
[2511.04439] CoRPO: Adding a Correctness Bias to GRPO Improves Generalization
Llms

[2511.04439] CoRPO: Adding a Correctness Bias to GRPO Improves Generalization

Abstract page for arXiv paper 2511.04439: CoRPO: Adding a Correctness Bias to GRPO Improves Generalization

arXiv - Machine Learning · 4 min ·
[2603.05228] The Geometric Inductive Bias of Grokking: Bypassing Phase Transitions via Architectural Topology
Machine Learning

[2603.05228] The Geometric Inductive Bias of Grokking: Bypassing Phase Transitions via Architectural Topology

Abstract page for arXiv paper 2603.05228: The Geometric Inductive Bias of Grokking: Bypassing Phase Transitions via Architectural Topology

arXiv - Machine Learning · 4 min ·
[2603.05149] Federated Causal Discovery Across Heterogeneous Datasets under Latent Confounding
Ai Safety

[2603.05149] Federated Causal Discovery Across Heterogeneous Datasets under Latent Confounding

Abstract page for arXiv paper 2603.05149: Federated Causal Discovery Across Heterogeneous Datasets under Latent Confounding

arXiv - Machine Learning · 4 min ·
[2603.04986] Debiasing Sequential Recommendation with Time-aware Inverse Propensity Scoring
Machine Learning

[2603.04986] Debiasing Sequential Recommendation with Time-aware Inverse Propensity Scoring

Abstract page for arXiv paper 2603.04986: Debiasing Sequential Recommendation with Time-aware Inverse Propensity Scoring

arXiv - AI · 4 min ·
[2603.04976] 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding
Llms

[2603.04976] 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding

Abstract page for arXiv paper 2603.04976: 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding

arXiv - AI · 4 min ·
Previous Page 14 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime