AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

Implementing advanced AI technologies in finance | MIT Technology Review

In finance departments that have long been defined by precision and control, AI has arrived less as a neatly managed upgrade than as a qu...

MIT Technology Review · 4 min · about 2 hours ago

Llms

[2602.07026] Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Abstract page for arXiv paper 2602.07026: Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

arXiv - AI · 4 min · about 9 hours ago

Machine Learning

[2511.22893] Switching-time bioprocess control with pulse-width-modulated optogenetics

Abstract page for arXiv paper 2511.22893: Switching-time bioprocess control with pulse-width-modulated optogenetics

arXiv - AI · 4 min · about 9 hours ago

All Content

Llms

[2605.06669] Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Trade-offs

Abstract page for arXiv paper 2605.06669: Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Tra...

arXiv - AI · 3 min · about 11 hours ago

Ai Safety

[2605.08012] Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims

Abstract page for arXiv paper 2605.08012: Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims

arXiv - AI · 3 min · about 11 hours ago

Ai Safety

[2605.07914] Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning

Abstract page for arXiv paper 2605.07914: Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration...

arXiv - Machine Learning · 4 min · about 11 hours ago

Machine Learning

[2605.07844] Distributional simplicity bias and effective convexity in Energy Based Models

Abstract page for arXiv paper 2605.07844: Distributional simplicity bias and effective convexity in Energy Based Models

arXiv - Machine Learning · 3 min · about 11 hours ago

Machine Learning

[2605.07724] Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences

Abstract page for arXiv paper 2605.07724: Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining w...

arXiv - AI · 3 min · about 11 hours ago

Ai Safety

[2605.07598] Optimal Recourse Summaries via Bi-Objective Decision Tree Learning

Abstract page for arXiv paper 2605.07598: Optimal Recourse Summaries via Bi-Objective Decision Tree Learning

arXiv - Machine Learning · 3 min · about 11 hours ago

Machine Learning

[2605.07551] Disagreement-Regularized Importance Sampling for Adversarial Label Corruption

Abstract page for arXiv paper 2605.07551: Disagreement-Regularized Importance Sampling for Adversarial Label Corruption

arXiv - Machine Learning · 3 min · about 11 hours ago

Machine Learning

[2605.07483] Does Your Neural Network Extrapolate? Feature Engineering as Identifiability Bias for OOD Generalization

Abstract page for arXiv paper 2605.07483: Does Your Neural Network Extrapolate? Feature Engineering as Identifiability Bias for OOD Gener...

arXiv - AI · 4 min · about 11 hours ago

Machine Learning

[2605.07456] Inference-Time Attribute Distribution Alignment for Unconditional Diffusion

Abstract page for arXiv paper 2605.07456: Inference-Time Attribute Distribution Alignment for Unconditional Diffusion

arXiv - Machine Learning · 3 min · about 11 hours ago

Llms

[2605.07407] Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer

Abstract page for arXiv paper 2605.07407: Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal...

arXiv - Machine Learning · 3 min · about 11 hours ago

Machine Learning

[2605.07397] Have Graph -- Will Lift? The Case for Higher-Order Benchmarks

Abstract page for arXiv paper 2605.07397: Have Graph -- Will Lift? The Case for Higher-Order Benchmarks

arXiv - Machine Learning · 3 min · about 11 hours ago

Machine Learning

[2605.07396] Rubric-based On-policy Distillation

Abstract page for arXiv paper 2605.07396: Rubric-based On-policy Distillation

arXiv - AI · 3 min · about 11 hours ago

Llms

[2605.07331] Rethinking Importance Sampling in LLM Policy Optimization: A Cumulative Token Perspective

Abstract page for arXiv paper 2605.07331: Rethinking Importance Sampling in LLM Policy Optimization: A Cumulative Token Perspective

arXiv - AI · 4 min · about 11 hours ago

Machine Learning

[2605.07133] GAD in the Wild: Benchmarking Graph Anomaly Detection under Realistic Deployment Challenges

Abstract page for arXiv paper 2605.07133: GAD in the Wild: Benchmarking Graph Anomaly Detection under Realistic Deployment Challenges

arXiv - AI · 4 min · about 11 hours ago

Llms

[2605.07105] Theoretical Limits of Language Model Alignment

Abstract page for arXiv paper 2605.07105: Theoretical Limits of Language Model Alignment

arXiv - Machine Learning · 4 min · about 11 hours ago

Ai Safety

[2605.07094] Actor-Critic with Active Importance Sampling

Abstract page for arXiv paper 2605.07094: Actor-Critic with Active Importance Sampling

arXiv - Machine Learning · 3 min · about 11 hours ago

Llms

[2605.06987] Response Time Enhances Alignment with Heterogeneous Preferences

Abstract page for arXiv paper 2605.06987: Response Time Enhances Alignment with Heterogeneous Preferences

arXiv - Machine Learning · 4 min · about 11 hours ago

Llms

[2605.06977] $f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses

Abstract page for arXiv paper 2605.06977: $f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses

arXiv - AI · 3 min · about 11 hours ago

Machine Learning

[2605.06979] PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction

Abstract page for arXiv paper 2605.06979: PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction

arXiv - AI · 4 min · about 11 hours ago

Llms

[2605.06939] Bias and Uncertainty in LLM-as-a-Judge Estimation

Abstract page for arXiv paper 2605.06939: Bias and Uncertainty in LLM-as-a-Judge Estimation

arXiv - Machine Learning · 3 min · about 11 hours ago

Previous Page 3 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Implementing advanced AI technologies in finance | MIT Technology Review

[2602.07026] Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

[2511.22893] Switching-time bioprocess control with pulse-width-modulated optogenetics

All Content

[2605.06669] Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Trade-offs

[2605.08012] Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims

[2605.07914] Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning

[2605.07844] Distributional simplicity bias and effective convexity in Energy Based Models

[2605.07724] Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences

[2605.07598] Optimal Recourse Summaries via Bi-Objective Decision Tree Learning

[2605.07551] Disagreement-Regularized Importance Sampling for Adversarial Label Corruption

[2605.07483] Does Your Neural Network Extrapolate? Feature Engineering as Identifiability Bias for OOD Generalization

[2605.07456] Inference-Time Attribute Distribution Alignment for Unconditional Diffusion

[2605.07407] Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer

[2605.07397] Have Graph -- Will Lift? The Case for Higher-Order Benchmarks

[2605.07396] Rubric-based On-policy Distillation

[2605.07331] Rethinking Importance Sampling in LLM Policy Optimization: A Cumulative Token Perspective

[2605.07133] GAD in the Wild: Benchmarking Graph Anomaly Detection under Realistic Deployment Challenges

[2605.07105] Theoretical Limits of Language Model Alignment

[2605.07094] Actor-Critic with Active Importance Sampling

[2605.06987] Response Time Enhances Alignment with Heterogeneous Preferences

[2605.06977] $f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses

[2605.06979] PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction

[2605.06939] Bias and Uncertainty in LLM-as-a-Judge Estimation

Related Topics

Stay updated with AI News