Top AI Safety & Ethics This Month

The most engaging ai safety & ethics content from this month, curated by AI News.

This Week This Month Guide Trending

1

[2604.24346] SycoPhantasy: Quantifying Sycophancy and Hallucination in Small Open Weight VLMs for Vision-Language Scoring of Fantasy Characters

Abstract page for arXiv paper 2604.24346: SycoPhantasy: Quantifying Sycophancy and Hallucination in Small Open Weight VLMs for Vision-Language Scoring of Fantasy Characters

arXiv - AI · 12 days ago
2

Looking for opinion of people in the industry. [D]

I am researching about AI infrastructure and would value someone's perspective who is close to enterprise AI deployment. At a high level, we are seeing more often: as enterprises move from copilots...

Reddit - Machine Learning · 9 days ago
3

When the Mirror Turns: How AI alignment reshapes the voice inside your head

We build our inner voices from the voices we're in dialogue with. Vygotsky established this nearly a century ago. For people in sustained conversation with AI systems, those systems have become par...

Reddit - Artificial Intelligence · 28 days ago
4

[2605.08019] Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

Abstract page for arXiv paper 2605.08019: Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

arXiv - AI · about 9 hours ago
5

[2604.10814] Online Covariance Estimation in Averaged SGD: Improved Batch-Mean Rates and Minimax Optimality via Trajectory Regression

Abstract page for arXiv paper 2604.10814: Online Covariance Estimation in Averaged SGD: Improved Batch-Mean Rates and Minimax Optimality via Trajectory Regression

arXiv - Machine Learning · 27 days ago
6

GSE AI governance rules hit lenders and servicers

Fannie Mae and Freddie Mac issued new AI governance rules, extending beyond underwriting to vendor and operational tools.

AI News - General · 25 days ago
7

[2604.10965] bioLeak: Leakage-Aware Modeling and Diagnostics for Machine Learning in R

Abstract page for arXiv paper 2604.10965: bioLeak: Leakage-Aware Modeling and Diagnostics for Machine Learning in R

arXiv - Machine Learning · 27 days ago
8

[2604.21469] Cross-Domain Data Selection and Augmentation for Automatic Compliance Detection

Abstract page for arXiv paper 2604.21469: Cross-Domain Data Selection and Augmentation for Automatic Compliance Detection

arXiv - Machine Learning · 17 days ago
9

[2405.03063] Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

Abstract page for arXiv paper 2405.03063: Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

arXiv - Machine Learning · 27 days ago
10

[2509.02651] Bias Detection in Emergency Psychiatry: Linking Negative Language to Diagnostic Disparities

Abstract page for arXiv paper 2509.02651: Bias Detection in Emergency Psychiatry: Linking Negative Language to Diagnostic Disparities

arXiv - Machine Learning · 27 days ago
11

[2509.20587] Unsupervised Domain Adaptation for Binary Classification with an Unobservable Source Subpopulation

Abstract page for arXiv paper 2509.20587: Unsupervised Domain Adaptation for Binary Classification with an Unobservable Source Subpopulation

arXiv - Machine Learning · 27 days ago
12

[2509.21042] LayerNorm Induces Recency Bias in Transformer Decoders

Abstract page for arXiv paper 2509.21042: LayerNorm Induces Recency Bias in Transformer Decoders

arXiv - Machine Learning · 27 days ago
13

[2601.17172] Who Gets Which Message? Auditing Demographic Bias in LLM-Generated Targeted Text

Abstract page for arXiv paper 2601.17172: Who Gets Which Message? Auditing Demographic Bias in LLM-Generated Targeted Text

arXiv - Machine Learning · 27 days ago
14

[2605.07545] Implicit Preference Alignment for Human Image Animation

Abstract page for arXiv paper 2605.07545: Implicit Preference Alignment for Human Image Animation

arXiv - AI · about 8 hours ago
15

Introducing AutoMuon, a one line drop in for AdamW [P]

Hey everyone, I've been working on a small Python package called AutoMuon that makes the Muon optimizer usable as a drop-in replacement for AdamW in arbitrary PyTorch training pipelines. The core i...

Reddit - Machine Learning · 15 days ago
16

[2506.09998] Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling

Abstract page for arXiv paper 2506.09998: Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling

arXiv - Machine Learning · 17 days ago
17

[2601.09253] RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning

Abstract page for arXiv paper 2601.09253: RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning

arXiv - AI · 17 days ago
18

[2605.07649] Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models

Abstract page for arXiv paper 2605.07649: Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models

arXiv - AI · about 8 hours ago
19

[2605.07821] Divide and Conquer: Object Co-occurrence Helps Mitigate Simplicity Bias in OOD Detection

Abstract page for arXiv paper 2605.07821: Divide and Conquer: Object Co-occurrence Helps Mitigate Simplicity Bias in OOD Detection

arXiv - AI · about 8 hours ago
20

[2605.00907] TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation

Abstract page for arXiv paper 2605.00907: TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation

arXiv - AI · 6 days ago
21

[2604.22190] From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

Abstract page for arXiv paper 2604.22190: From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

arXiv - AI · 14 days ago
22

[2510.01569] InvThink: Premortem Reasoning for Safer Language Models

Abstract page for arXiv paper 2510.01569: InvThink: Premortem Reasoning for Safer Language Models

arXiv - AI · about 8 hours ago
23

[2601.23143] THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Abstract page for arXiv paper 2601.23143: THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

arXiv - AI · about 8 hours ago
24

[2602.00924] Supervised sparse auto-encoders for interpretable and compositional representations

Abstract page for arXiv paper 2602.00924: Supervised sparse auto-encoders for interpretable and compositional representations

arXiv - AI · about 8 hours ago
25

[2407.04183] Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

Abstract page for arXiv paper 2407.04183: Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

arXiv - AI · about 8 hours ago
26

[2511.22893] Switching-time bioprocess control with pulse-width-modulated optogenetics

Abstract page for arXiv paper 2511.22893: Switching-time bioprocess control with pulse-width-modulated optogenetics

arXiv - AI · about 8 hours ago
27

[2604.09439] TME-PSR: Time-aware, Multi-interest, and Explanation Personalization for Sequential Recommendation

Abstract page for arXiv paper 2604.09439: TME-PSR: Time-aware, Multi-interest, and Explanation Personalization for Sequential Recommendation

arXiv - AI · 28 days ago
28

Reexamining Philosophical Concepts to Improve AI Safety and Alignment

Abstract: Some of the core principles that govern AI safety and alignment research come from 18th–19th century German metaphysics and philosophy, particularly the triad of epistemology, ontology, a...

Reddit - Artificial Intelligence · 9 days ago
29

Is agentic AI governance even a computationally bounded process?

Wrt to context drifting, goal misalignment, etc. Is it possible that a Turing machine could, in theory, handle all of the known issues wrt governance? Or is it a case where (say) 90% of the issues ...

Reddit - Artificial Intelligence · 1 day ago
30

[2605.07631] Inference Time Causal Probing in LLMs

Abstract page for arXiv paper 2605.07631: Inference Time Causal Probing in LLMs

arXiv - AI · about 9 hours ago

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime