Top AI Safety & Ethics This Month

The most engaging ai safety & ethics content from this month, curated by AI News.

This Week This Month Guide Trending
  1. 1

    Anthropic says it will challenge Pentagon's supply chain risk designation in court

    Anthropic plans to legally contest the Pentagon's classification of its supply chain risks, highlighting tensions between AI companies and government regulations.

    Reddit - Artificial Intelligence · 27 days ago
  2. 2

    Co-Author of Citrini AI Report Warns of ‘Scary Situation’ for White-Collar Labor After Block Laid Off 4,000 Workers

    The co-author of the Citrini AI report highlights concerns over significant job losses in white-collar sectors following Block's recent layoffs of 4,000 workers, emphasizing the potential impact of...

    Reddit - Artificial Intelligence · 27 days ago
  3. 3

    I used steelman prompting to audit bias across six major LLMs. The default-to-steelman gap was consistent and measurable.

    This article discusses an experiment using steelman prompting to evaluate bias in six major LLMs, focusing on their interpretations of 1 Corinthians 6–7 and implications for Christian sexual ethics.

    Reddit - Artificial Intelligence · 27 days ago
  4. 4

    Anthropic Hits Back After US Military Labels It a 'Supply Chain Risk' | WIRED

    Anthropic responds to the Pentagon's designation of its AI technology as a 'supply chain risk,' arguing it would be legally unsound and could set a dangerous precedent for American companies.

    Wired - AI · 27 days ago
  5. 5

    Anthropic should move to Europe

    The article discusses the potential benefits of relocating Anthropic, an AI company, to Europe as a means of providing a safer operational environment away from perceived U.S. government pressures.

    Reddit - Artificial Intelligence · 27 days ago
  6. 6

    Who's really running AI? Inside the billion-dollar battle over regulation with Alex Bores  | TechCrunch

    Alex Bores discusses the RAISE Act and the influence of super PACs on AI regulation in the U.S. during his appearance on TechCrunch's Equity podcast.

    TechCrunch - AI · 28 days ago
  7. 7

    [2510.18299] Physics-Informed Parametric Bandits for Beam Alignment in mmWave Communications

    Abstract page for arXiv paper 2510.18299: Physics-Informed Parametric Bandits for Beam Alignment in mmWave Communications

    arXiv - Machine Learning · 24 days ago
  8. 8

    [2503.11832] Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning

    Abstract page for arXiv paper 2503.11832: Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning

    arXiv - Machine Learning · 24 days ago
  9. 9

    Anthropic refuses Pentagon’s new terms, standing firm on lethal autonomous weapons and mass surveillance | The Verge

    Anthropic has rejected the Pentagon's ultimatum for unrestricted access to its AI, maintaining its stance against lethal autonomous weapons and mass surveillance.

    The Verge - AI · 29 days ago
  10. 10

    Good on Anthropic for declining the Pentagon deal

    The article discusses Anthropic's decision to decline a deal with the Pentagon, highlighting concerns over user security and ethical implications in AI development.

    Reddit - Artificial Intelligence · 28 days ago
  11. 11

    OpenAI Fires an Employee for Prediction Market Insider Trading | WIRED

    OpenAI has terminated an employee for insider trading on prediction markets, raising concerns about the ethical implications of using confidential information for personal gain.

    Wired - AI · 28 days ago
  12. 12

    Will AI accelerate or undermine the way humans have always innovated?

    The article explores how technological innovation has historically relied on collaboration and expertise, contrasting it with individual learning limitations, and discusses the potential impact of ...

    AI News - General · 27 days ago
  13. 13

    [2510.26905] Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations

    Abstract page for arXiv paper 2510.26905: Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations

    arXiv - AI · 22 days ago
  14. 14

    Anthropic vs. the Pentagon: What’s actually at stake? | TechCrunch

    The article discusses the conflict between Anthropic and the Pentagon over the use of AI in military applications, focusing on ethical concerns surrounding autonomous weapons and surveillance.

    TechCrunch - AI · 28 days ago
  15. 15

    Musk bashes OpenAI in deposition, saying 'nobody committed suicide because of Grok' | TechCrunch

    Elon Musk criticizes OpenAI's safety record in a deposition for his lawsuit against the company, claiming his AI venture, xAI, prioritizes safety over profit.

    TechCrunch - AI · 28 days ago
  16. 16

    [2510.10932] DropVLA: An Action-Level Backdoor Attack on Vision--Language--Action Models

    The paper presents DropVLA, an action-level backdoor attack on Vision-Language-Action models, demonstrating how minimal data poisoning can induce targeted actions without degrading nominal performa...

    arXiv - AI · 28 days ago
  17. 17

    Societal level AI Tragedy of the Commons. Someone please prove me wrong.

    The article discusses concerns about AI-induced layoffs of white-collar workers, emphasizing the potential economic impact due to reduced consumer spending.

    Reddit - Artificial Intelligence · 28 days ago
  18. 18

    [2602.22546] Requesting Expert Reasoning: Augmenting LLM Agents with Learned Collaborative Intervention

    This article presents a framework called AHCE for enhancing Large Language Model (LLM) agents through effective human collaboration, significantly improving task success rates in specialized domains.

    arXiv - AI · 28 days ago
  19. 19

    [2602.22758] Decomposing Physician Disagreement in HealthBench

    This paper analyzes physician disagreement in the HealthBench dataset, identifying key factors contributing to variance in evaluations and suggesting improvements for medical AI assessments.

    arXiv - AI · 28 days ago
  20. 20

    [2602.23232] ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays

    The paper presents ReCoN-Ipsundrum, an inspectable AI agent that integrates affect-coupled control with a recurrent persistence loop, exploring its implications for machine consciousness and behavior.

    arXiv - AI · 28 days ago
  21. 21

    [2602.23329] LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

    This article examines the effectiveness of large language models (LLMs) in enhancing novice users' performance on complex biological tasks, revealing significant accuracy improvements over traditio...

    arXiv - AI · 28 days ago
  22. 22

    [2602.23164] MetaOthello: A Controlled Study of Multiple World Models in Transformers

    The paper presents MetaOthello, a study exploring how transformers manage multiple world models through a controlled suite of Othello variants, revealing insights into shared representation and mod...

    arXiv - Machine Learning · 28 days ago
  23. 23

    [2602.22631] TorchLean: Formalizing Neural Networks in Lean

    TorchLean is a framework that formalizes neural networks within the Lean 4 theorem prover, enabling precise semantics for execution and verification, addressing critical safety in AI applications.

    arXiv - Machine Learning · 28 days ago
  24. 24

    [2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

    The paper discusses the evaluation challenges in text-to-image generation, focusing on classifier-free guidance (CFG) and proposing a new evaluation framework to address biases in current methods.

    arXiv - AI · 28 days ago
  25. 25

    [2602.22903] PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA

    The paper presents PSQE, a method for enhancing pseudo seed quality in unsupervised multimodal entity alignment, addressing challenges in data integration for large language models.

    arXiv - Machine Learning · 28 days ago
  26. 26

    [2602.22700] IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

    The paper presents IMMACULATE, a framework for auditing large language models (LLMs) using verifiable computation to detect economic deviations without needing trusted hardware.

    arXiv - AI · 28 days ago
  27. 27

    [2602.22710] Same Words, Different Judgments: Modality Effects on Preference Alignment

    This study explores how modality affects preference alignment in AI systems, comparing human and synthetic evaluations of audio and text content. It finds that audio ratings are reliable but exhibi...

    arXiv - AI · 28 days ago
  28. 28

    [2602.22724] AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

    AgentSentry introduces a novel framework to mitigate indirect prompt injection (IPI) in LLM agents, enhancing their security while maintaining task performance.

    arXiv - AI · 28 days ago
  29. 29

    [2602.22740] AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

    The paper presents AMLRIS, a novel training strategy for Referring Image Segmentation (RIS) that enhances object segmentation through alignment-aware masked learning, achieving state-of-the-art res...

    arXiv - AI · 28 days ago
  30. 30

    UK cops suspend live facial recog as study finds racial bias

    submitted by /u/ateam1984 [link] [comments]

    Reddit - Artificial Intelligence · 4 days ago

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime