AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min · about 3 hours ago

Ai Safety

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min · about 13 hours ago

All Content

Machine Learning

[2603.05375] Robust Node Affinities via Jaccard-Biased Random Walks and Rank Aggregation

Abstract page for arXiv paper 2603.05375: Robust Node Affinities via Jaccard-Biased Random Walks and Rank Aggregation

arXiv - Machine Learning · 4 min · 24 days ago

Machine Learning

[2603.05327] FairFinGAN: Fairness-aware Synthetic Financial Data Generation

Abstract page for arXiv paper 2603.05327: FairFinGAN: Fairness-aware Synthetic Financial Data Generation

arXiv - Machine Learning · 3 min · 24 days ago

Machine Learning

[2603.05293] Knowledge Divergence and the Value of Debate for Scalable Oversight

Abstract page for arXiv paper 2603.05293: Knowledge Divergence and the Value of Debate for Scalable Oversight

arXiv - Machine Learning · 4 min · 24 days ago

Machine Learning

[2603.05175] Incentive Aware AI Regulations: A Credal Characterisation

Abstract page for arXiv paper 2603.05175: Incentive Aware AI Regulations: A Credal Characterisation

arXiv - Machine Learning · 4 min · 24 days ago

Llms

[2603.04851] Why Is RLHF Alignment Shallow? A Gradient Analysis

Abstract page for arXiv paper 2603.04851: Why Is RLHF Alignment Shallow? A Gradient Analysis

arXiv - Machine Learning · 3 min · 24 days ago

Machine Learning

[2603.04831] Missingness Bias Calibration in Feature Attribution Explanations

Abstract page for arXiv paper 2603.04831: Missingness Bias Calibration in Feature Attribution Explanations

arXiv - Machine Learning · 3 min · 24 days ago

Machine Learning

[2603.04703] Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness

Abstract page for arXiv paper 2603.04703: Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness

arXiv - Machine Learning · 4 min · 24 days ago

Ai Safety

[2603.04595] A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments

Abstract page for arXiv paper 2603.04595: A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthca...

arXiv - Machine Learning · 4 min · 24 days ago

Machine Learning

[2602.09980] Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Informed Neural Networks

Abstract page for arXiv paper 2602.09980: Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Info...

arXiv - Machine Learning · 4 min · 24 days ago

Llms

[2510.00177] PrefDisco: Benchmarking Proactive Personalized Reasoning

Abstract page for arXiv paper 2510.00177: PrefDisco: Benchmarking Proactive Personalized Reasoning

arXiv - AI · 4 min · 24 days ago

Llms

[2509.23886] Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer

Abstract page for arXiv paper 2509.23886: Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer

arXiv - Machine Learning · 4 min · 24 days ago

Llms

[2508.06249] In-Training Defenses against Emergent Misalignment in Language Models

Abstract page for arXiv paper 2508.06249: In-Training Defenses against Emergent Misalignment in Language Models

arXiv - Machine Learning · 4 min · 24 days ago

Robotics

[2505.05589] ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation

Abstract page for arXiv paper 2505.05589: ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance...

arXiv - Machine Learning · 4 min · 24 days ago

Llms

[2602.00485] Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

Abstract page for arXiv paper 2602.00485: Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

arXiv - AI · 4 min · 24 days ago

Llms

[2511.21033] Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning

Abstract page for arXiv paper 2511.21033: Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning

arXiv - AI · 4 min · 24 days ago

Llms

[2511.04439] CoRPO: Adding a Correctness Bias to GRPO Improves Generalization

Abstract page for arXiv paper 2511.04439: CoRPO: Adding a Correctness Bias to GRPO Improves Generalization

arXiv - Machine Learning · 4 min · 24 days ago

Machine Learning

[2603.05228] The Geometric Inductive Bias of Grokking: Bypassing Phase Transitions via Architectural Topology

Abstract page for arXiv paper 2603.05228: The Geometric Inductive Bias of Grokking: Bypassing Phase Transitions via Architectural Topology

arXiv - Machine Learning · 4 min · 24 days ago

Ai Safety

[2603.05149] Federated Causal Discovery Across Heterogeneous Datasets under Latent Confounding

Abstract page for arXiv paper 2603.05149: Federated Causal Discovery Across Heterogeneous Datasets under Latent Confounding

arXiv - Machine Learning · 4 min · 24 days ago

Machine Learning

[2603.04986] Debiasing Sequential Recommendation with Time-aware Inverse Propensity Scoring

Abstract page for arXiv paper 2603.04986: Debiasing Sequential Recommendation with Time-aware Inverse Propensity Scoring

arXiv - AI · 4 min · 24 days ago

Llms

[2603.04976] 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding

Abstract page for arXiv paper 2603.04976: 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding

arXiv - AI · 4 min · 24 days ago

Previous Page 14 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

Bias in AI: Examples and 6 Ways to Fix it in 2026

All Content

[2603.05375] Robust Node Affinities via Jaccard-Biased Random Walks and Rank Aggregation

[2603.05327] FairFinGAN: Fairness-aware Synthetic Financial Data Generation

[2603.05293] Knowledge Divergence and the Value of Debate for Scalable Oversight

[2603.05175] Incentive Aware AI Regulations: A Credal Characterisation

[2603.04851] Why Is RLHF Alignment Shallow? A Gradient Analysis

[2603.04831] Missingness Bias Calibration in Feature Attribution Explanations

[2603.04703] Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness

[2603.04595] A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthcare Data Environments

[2602.09980] Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Informed Neural Networks

[2510.00177] PrefDisco: Benchmarking Proactive Personalized Reasoning

[2509.23886] Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer

[2508.06249] In-Training Defenses against Emergent Misalignment in Language Models

[2505.05589] ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation

[2602.00485] Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

[2511.21033] Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning

[2511.04439] CoRPO: Adding a Correctness Bias to GRPO Improves Generalization

[2603.05228] The Geometric Inductive Bias of Grokking: Bypassing Phase Transitions via Architectural Topology

[2603.05149] Federated Causal Discovery Across Heterogeneous Datasets under Latent Confounding

[2603.04986] Debiasing Sequential Recommendation with Time-aware Inverse Propensity Scoring

[2603.04976] 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding

Related Topics

Stay updated with AI News