AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min · about 2 hours ago

Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min · about 8 hours ago

Machine Learning

We need to teach AI the essence of being human to reduce the risk of misalignment

One part of the alignment problem is that AI does not genuinely understand what it's like to live in the world, even though it can descri...

Reddit - Artificial Intelligence · 1 min · 1 day ago

All Content

Machine Learning

[2603.20631] LassoFlexNet: Flexible Neural Architecture for Tabular Data

Abstract page for arXiv paper 2603.20631: LassoFlexNet: Flexible Neural Architecture for Tabular Data

arXiv - Machine Learning · 3 min · 5 days ago

Machine Learning

[2603.20388] From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

Abstract page for arXiv paper 2603.20388: From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

arXiv - Machine Learning · 3 min · 5 days ago

Llms

[2603.20212] Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models

Abstract page for arXiv paper 2603.20212: Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models

arXiv - Machine Learning · 3 min · 5 days ago

Ai Safety

[2603.20198] Visual Exclusivity Attacks: Automatic Multimodal Red Teaming via Agentic Planning

Abstract page for arXiv paper 2603.20198: Visual Exclusivity Attacks: Automatic Multimodal Red Teaming via Agentic Planning

arXiv - Machine Learning · 4 min · 5 days ago

Nlp

[2603.22155] RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

Abstract page for arXiv paper 2603.22155: RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

arXiv - Machine Learning · 3 min · 5 days ago

Machine Learning

[2603.21612] Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction

Abstract page for arXiv paper 2603.21612: Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2603.21584] SSAM: Singular Subspace Alignment for Merging Multimodal Large Language Models

Abstract page for arXiv paper 2603.21584: SSAM: Singular Subspace Alignment for Merging Multimodal Large Language Models

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2603.21567] Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy

Abstract page for arXiv paper 2603.21567: Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy

arXiv - Machine Learning · 3 min · 5 days ago

Ai Safety

[2603.21491] Learning Can Converge Stably to the Wrong Belief under Latent Reliability

Abstract page for arXiv paper 2603.21491: Learning Can Converge Stably to the Wrong Belief under Latent Reliability

arXiv - Machine Learning · 3 min · 5 days ago

Ai Safety

[2603.21485] Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies

Abstract page for arXiv paper 2603.21485: Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies

arXiv - Machine Learning · 4 min · 5 days ago

Machine Learning

[2603.21393] A Generalised Exponentiated Gradient Approach to Enhance Fairness in Binary and Multi-class Classification Tasks

Abstract page for arXiv paper 2603.21393: A Generalised Exponentiated Gradient Approach to Enhance Fairness in Binary and Multi-class Cla...

arXiv - Machine Learning · 3 min · 5 days ago

Machine Learning

[2603.21319] Active Inference Agency Formalization, Metrics, and Convergence Assessments

Abstract page for arXiv paper 2603.21319: Active Inference Agency Formalization, Metrics, and Convergence Assessments

arXiv - Machine Learning · 4 min · 5 days ago

Machine Learning

[2603.21315] FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

Abstract page for arXiv paper 2603.21315: FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2603.20921] Discriminative Representation Learning for Clinical Prediction

Abstract page for arXiv paper 2603.20921: Discriminative Representation Learning for Clinical Prediction

arXiv - Machine Learning · 3 min · 5 days ago

Machine Learning

[2603.20775] Evaluating Uplift Modeling under Structural Biases: Insights into Metric Stability and Model Robustness

Abstract page for arXiv paper 2603.20775: Evaluating Uplift Modeling under Structural Biases: Insights into Metric Stability and Model Ro...

arXiv - Machine Learning · 4 min · 5 days ago

Machine Learning

[2603.20687] Neuronal Self-Adaptation Enhances Capacity and Robustness of Representation in Spiking Neural Networks

Abstract page for arXiv paper 2603.20687: Neuronal Self-Adaptation Enhances Capacity and Robustness of Representation in Spiking Neural N...

arXiv - Machine Learning · 3 min · 5 days ago

Llms

[2603.20632] Optimal low-rank stochastic gradient estimation for LLM training

Abstract page for arXiv paper 2603.20632: Optimal low-rank stochastic gradient estimation for LLM training

arXiv - Machine Learning · 3 min · 5 days ago

Machine Learning

[2603.20453] Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret

Abstract page for arXiv paper 2603.20453: Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2603.17655] Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment

Abstract page for arXiv paper 2603.17655: Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment

arXiv - AI · 4 min · 5 days ago

Llms

[2603.14602] PA3: Policy-Aware Agent Alignment through Chain-of-Thought

Abstract page for arXiv paper 2603.14602: PA3: Policy-Aware Agent Alignment through Chain-of-Thought

arXiv - Machine Learning · 3 min · 5 days ago

Previous Page 7 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Bias in AI: Examples and 6 Ways to Fix it in 2026

[R] I built a benchmark that catches LLMs breaking physics laws

We need to teach AI the essence of being human to reduce the risk of misalignment

All Content

[2603.20631] LassoFlexNet: Flexible Neural Architecture for Tabular Data

[2603.20388] From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

[2603.20212] Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models

[2603.20198] Visual Exclusivity Attacks: Automatic Multimodal Red Teaming via Agentic Planning

[2603.22155] RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

[2603.21612] Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction

[2603.21584] SSAM: Singular Subspace Alignment for Merging Multimodal Large Language Models

[2603.21567] Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy

[2603.21491] Learning Can Converge Stably to the Wrong Belief under Latent Reliability

[2603.21485] Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies

[2603.21393] A Generalised Exponentiated Gradient Approach to Enhance Fairness in Binary and Multi-class Classification Tasks

[2603.21319] Active Inference Agency Formalization, Metrics, and Convergence Assessments

[2603.21315] FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

[2603.20921] Discriminative Representation Learning for Clinical Prediction

[2603.20775] Evaluating Uplift Modeling under Structural Biases: Insights into Metric Stability and Model Robustness

[2603.20687] Neuronal Self-Adaptation Enhances Capacity and Robustness of Representation in Spiking Neural Networks

[2603.20632] Optimal low-rank stochastic gradient estimation for LLM training

[2603.20453] Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret

[2603.17655] Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment

[2603.14602] PA3: Policy-Aware Agent Alignment through Chain-of-Thought

Related Topics

Stay updated with AI News