Top AI Safety & Ethics This Month

The most engaging ai safety & ethics content from this month, curated by AI News.

This Week This Month Guide Trending
  1. 1

    [2604.24346] SycoPhantasy: Quantifying Sycophancy and Hallucination in Small Open Weight VLMs for Vision-Language Scoring of Fantasy Characters

    Abstract page for arXiv paper 2604.24346: SycoPhantasy: Quantifying Sycophancy and Hallucination in Small Open Weight VLMs for Vision-Language Scoring of Fantasy Characters

    arXiv - AI · 12 days ago
  2. 2

    Looking for opinion of people in the industry. [D]

    I am researching about AI infrastructure and would value someone's perspective who is close to enterprise AI deployment. At a high level, we are seeing more often: as enterprises move from copilots...

    Reddit - Machine Learning · 9 days ago
  3. 3

    When the Mirror Turns: How AI alignment reshapes the voice inside your head

    We build our inner voices from the voices we're in dialogue with. Vygotsky established this nearly a century ago. For people in sustained conversation with AI systems, those systems have become par...

    Reddit - Artificial Intelligence · 28 days ago
  4. 4

    [2605.08019] Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

    Abstract page for arXiv paper 2605.08019: Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

    arXiv - AI · about 9 hours ago
  5. 5

    [2604.10814] Online Covariance Estimation in Averaged SGD: Improved Batch-Mean Rates and Minimax Optimality via Trajectory Regression

    Abstract page for arXiv paper 2604.10814: Online Covariance Estimation in Averaged SGD: Improved Batch-Mean Rates and Minimax Optimality via Trajectory Regression

    arXiv - Machine Learning · 27 days ago
  6. 6

    GSE AI governance rules hit lenders and servicers

    Fannie Mae and Freddie Mac issued new AI governance rules, extending beyond underwriting to vendor and operational tools.

    AI News - General · 25 days ago
  7. 7

    [2604.10965] bioLeak: Leakage-Aware Modeling and Diagnostics for Machine Learning in R

    Abstract page for arXiv paper 2604.10965: bioLeak: Leakage-Aware Modeling and Diagnostics for Machine Learning in R

    arXiv - Machine Learning · 27 days ago
  8. 8

    [2604.21469] Cross-Domain Data Selection and Augmentation for Automatic Compliance Detection

    Abstract page for arXiv paper 2604.21469: Cross-Domain Data Selection and Augmentation for Automatic Compliance Detection

    arXiv - Machine Learning · 17 days ago
  9. 9

    [2405.03063] Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

    Abstract page for arXiv paper 2405.03063: Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

    arXiv - Machine Learning · 27 days ago
  10. 10

    [2509.02651] Bias Detection in Emergency Psychiatry: Linking Negative Language to Diagnostic Disparities

    Abstract page for arXiv paper 2509.02651: Bias Detection in Emergency Psychiatry: Linking Negative Language to Diagnostic Disparities

    arXiv - Machine Learning · 27 days ago
  11. 11

    [2509.20587] Unsupervised Domain Adaptation for Binary Classification with an Unobservable Source Subpopulation

    Abstract page for arXiv paper 2509.20587: Unsupervised Domain Adaptation for Binary Classification with an Unobservable Source Subpopulation

    arXiv - Machine Learning · 27 days ago
  12. 12

    [2509.21042] LayerNorm Induces Recency Bias in Transformer Decoders

    Abstract page for arXiv paper 2509.21042: LayerNorm Induces Recency Bias in Transformer Decoders

    arXiv - Machine Learning · 27 days ago
  13. 13

    [2601.17172] Who Gets Which Message? Auditing Demographic Bias in LLM-Generated Targeted Text

    Abstract page for arXiv paper 2601.17172: Who Gets Which Message? Auditing Demographic Bias in LLM-Generated Targeted Text

    arXiv - Machine Learning · 27 days ago
  14. 14

    [2605.07545] Implicit Preference Alignment for Human Image Animation

    Abstract page for arXiv paper 2605.07545: Implicit Preference Alignment for Human Image Animation

    arXiv - AI · about 8 hours ago
  15. 15

    Introducing AutoMuon, a one line drop in for AdamW [P]

    Hey everyone, I've been working on a small Python package called AutoMuon that makes the Muon optimizer usable as a drop-in replacement for AdamW in arbitrary PyTorch training pipelines. The core i...

    Reddit - Machine Learning · 15 days ago
  16. 16

    [2506.09998] Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling

    Abstract page for arXiv paper 2506.09998: Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling

    arXiv - Machine Learning · 17 days ago
  17. 17

    [2601.09253] RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning

    Abstract page for arXiv paper 2601.09253: RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning

    arXiv - AI · 17 days ago
  18. 18

    [2605.07649] Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models

    Abstract page for arXiv paper 2605.07649: Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models

    arXiv - AI · about 8 hours ago
  19. 19

    [2605.07821] Divide and Conquer: Object Co-occurrence Helps Mitigate Simplicity Bias in OOD Detection

    Abstract page for arXiv paper 2605.07821: Divide and Conquer: Object Co-occurrence Helps Mitigate Simplicity Bias in OOD Detection

    arXiv - AI · about 8 hours ago
  20. 20

    [2605.00907] TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation

    Abstract page for arXiv paper 2605.00907: TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation

    arXiv - AI · 6 days ago
  21. 21

    [2604.22190] From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

    Abstract page for arXiv paper 2604.22190: From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification

    arXiv - AI · 14 days ago
  22. 22

    [2510.01569] InvThink: Premortem Reasoning for Safer Language Models

    Abstract page for arXiv paper 2510.01569: InvThink: Premortem Reasoning for Safer Language Models

    arXiv - AI · about 8 hours ago
  23. 23

    [2601.23143] THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

    Abstract page for arXiv paper 2601.23143: THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

    arXiv - AI · about 8 hours ago
  24. 24

    [2602.00924] Supervised sparse auto-encoders for interpretable and compositional representations

    Abstract page for arXiv paper 2602.00924: Supervised sparse auto-encoders for interpretable and compositional representations

    arXiv - AI · about 8 hours ago
  25. 25

    [2407.04183] Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

    Abstract page for arXiv paper 2407.04183: Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

    arXiv - AI · about 8 hours ago
  26. 26

    [2511.22893] Switching-time bioprocess control with pulse-width-modulated optogenetics

    Abstract page for arXiv paper 2511.22893: Switching-time bioprocess control with pulse-width-modulated optogenetics

    arXiv - AI · about 8 hours ago
  27. 27

    [2604.09439] TME-PSR: Time-aware, Multi-interest, and Explanation Personalization for Sequential Recommendation

    Abstract page for arXiv paper 2604.09439: TME-PSR: Time-aware, Multi-interest, and Explanation Personalization for Sequential Recommendation

    arXiv - AI · 28 days ago
  28. 28

    Reexamining Philosophical Concepts to Improve AI Safety and Alignment

    Abstract: Some of the core principles that govern AI safety and alignment research come from 18th–19th century German metaphysics and philosophy, particularly the triad of epistemology, ontology, a...

    Reddit - Artificial Intelligence · 9 days ago
  29. 29

    Is agentic AI governance even a computationally bounded process?

    Wrt to context drifting, goal misalignment, etc. Is it possible that a Turing machine could, in theory, handle all of the known issues wrt governance? Or is it a case where (say) 90% of the issues ...

    Reddit - Artificial Intelligence · 1 day ago
  30. 30

    [2605.07631] Inference Time Causal Probing in LLMs

    Abstract page for arXiv paper 2605.07631: Inference Time Causal Probing in LLMs

    arXiv - AI · about 9 hours ago

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest β€’ Unsubscribe anytime