AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min ·
Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min ·
Machine Learning

We need to teach AI the essence of being human to reduce the risk of misalignment

One part of the alignment problem is that AI does not genuinely understand what it's like to live in the world, even though it can descri...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2603.20631] LassoFlexNet: Flexible Neural Architecture for Tabular Data
Machine Learning

[2603.20631] LassoFlexNet: Flexible Neural Architecture for Tabular Data

Abstract page for arXiv paper 2603.20631: LassoFlexNet: Flexible Neural Architecture for Tabular Data

arXiv - Machine Learning · 3 min ·
[2603.20388] From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators
Machine Learning

[2603.20388] From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

Abstract page for arXiv paper 2603.20388: From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

arXiv - Machine Learning · 3 min ·
[2603.20212] Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models
Llms

[2603.20212] Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models

Abstract page for arXiv paper 2603.20212: Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models

arXiv - Machine Learning · 3 min ·
[2603.20198] Visual Exclusivity Attacks: Automatic Multimodal Red Teaming via Agentic Planning
Ai Safety

[2603.20198] Visual Exclusivity Attacks: Automatic Multimodal Red Teaming via Agentic Planning

Abstract page for arXiv paper 2603.20198: Visual Exclusivity Attacks: Automatic Multimodal Red Teaming via Agentic Planning

arXiv - Machine Learning · 4 min ·
[2603.22155] RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation
Nlp

[2603.22155] RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

Abstract page for arXiv paper 2603.22155: RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

arXiv - Machine Learning · 3 min ·
[2603.21612] Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction
Machine Learning

[2603.21612] Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction

Abstract page for arXiv paper 2603.21612: Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction

arXiv - Machine Learning · 4 min ·
[2603.21584] SSAM: Singular Subspace Alignment for Merging Multimodal Large Language Models
Llms

[2603.21584] SSAM: Singular Subspace Alignment for Merging Multimodal Large Language Models

Abstract page for arXiv paper 2603.21584: SSAM: Singular Subspace Alignment for Merging Multimodal Large Language Models

arXiv - Machine Learning · 4 min ·
[2603.21567] Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy
Llms

[2603.21567] Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy

Abstract page for arXiv paper 2603.21567: Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy

arXiv - Machine Learning · 3 min ·
[2603.21491] Learning Can Converge Stably to the Wrong Belief under Latent Reliability
Ai Safety

[2603.21491] Learning Can Converge Stably to the Wrong Belief under Latent Reliability

Abstract page for arXiv paper 2603.21491: Learning Can Converge Stably to the Wrong Belief under Latent Reliability

arXiv - Machine Learning · 3 min ·
[2603.21485] Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
Ai Safety

[2603.21485] Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies

Abstract page for arXiv paper 2603.21485: Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies

arXiv - Machine Learning · 4 min ·
[2603.21393] A Generalised Exponentiated Gradient Approach to Enhance Fairness in Binary and Multi-class Classification Tasks
Machine Learning

[2603.21393] A Generalised Exponentiated Gradient Approach to Enhance Fairness in Binary and Multi-class Classification Tasks

Abstract page for arXiv paper 2603.21393: A Generalised Exponentiated Gradient Approach to Enhance Fairness in Binary and Multi-class Cla...

arXiv - Machine Learning · 3 min ·
[2603.21319] Active Inference Agency Formalization, Metrics, and Convergence Assessments
Machine Learning

[2603.21319] Active Inference Agency Formalization, Metrics, and Convergence Assessments

Abstract page for arXiv paper 2603.21319: Active Inference Agency Formalization, Metrics, and Convergence Assessments

arXiv - Machine Learning · 4 min ·
[2603.21315] FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models
Machine Learning

[2603.21315] FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

Abstract page for arXiv paper 2603.21315: FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

arXiv - Machine Learning · 4 min ·
[2603.20921] Discriminative Representation Learning for Clinical Prediction
Llms

[2603.20921] Discriminative Representation Learning for Clinical Prediction

Abstract page for arXiv paper 2603.20921: Discriminative Representation Learning for Clinical Prediction

arXiv - Machine Learning · 3 min ·
[2603.20775] Evaluating Uplift Modeling under Structural Biases: Insights into Metric Stability and Model Robustness
Machine Learning

[2603.20775] Evaluating Uplift Modeling under Structural Biases: Insights into Metric Stability and Model Robustness

Abstract page for arXiv paper 2603.20775: Evaluating Uplift Modeling under Structural Biases: Insights into Metric Stability and Model Ro...

arXiv - Machine Learning · 4 min ·
[2603.20687] Neuronal Self-Adaptation Enhances Capacity and Robustness of Representation in Spiking Neural Networks
Machine Learning

[2603.20687] Neuronal Self-Adaptation Enhances Capacity and Robustness of Representation in Spiking Neural Networks

Abstract page for arXiv paper 2603.20687: Neuronal Self-Adaptation Enhances Capacity and Robustness of Representation in Spiking Neural N...

arXiv - Machine Learning · 3 min ·
[2603.20632] Optimal low-rank stochastic gradient estimation for LLM training
Llms

[2603.20632] Optimal low-rank stochastic gradient estimation for LLM training

Abstract page for arXiv paper 2603.20632: Optimal low-rank stochastic gradient estimation for LLM training

arXiv - Machine Learning · 3 min ·
[2603.20453] Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret
Machine Learning

[2603.20453] Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret

Abstract page for arXiv paper 2603.20453: Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret

arXiv - Machine Learning · 4 min ·
[2603.17655] Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment
Llms

[2603.17655] Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment

Abstract page for arXiv paper 2603.17655: Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment

arXiv - AI · 4 min ·
[2603.14602] PA3: Policy-Aware Agent Alignment through Chain-of-Thought
Llms

[2603.14602] PA3: Policy-Aware Agent Alignment through Chain-of-Thought

Abstract page for arXiv paper 2603.14602: PA3: Policy-Aware Agent Alignment through Chain-of-Thought

arXiv - Machine Learning · 3 min ·
Previous Page 7 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime