Top AI Safety & Ethics This Month
The most engaging ai safety & ethics content from this month, curated by AI News.
-
1
Anthropic says it will challenge Pentagon's supply chain risk designation in court
Anthropic plans to legally contest the Pentagon's classification of its supply chain risks, highlighting tensions between AI companies and government regulations.
Reddit - Artificial Intelligence · 27 days ago -
2
Co-Author of Citrini AI Report Warns of ‘Scary Situation’ for White-Collar Labor After Block Laid Off 4,000 Workers
The co-author of the Citrini AI report highlights concerns over significant job losses in white-collar sectors following Block's recent layoffs of 4,000 workers, emphasizing the potential impact of...
Reddit - Artificial Intelligence · 27 days ago -
3
I used steelman prompting to audit bias across six major LLMs. The default-to-steelman gap was consistent and measurable.
This article discusses an experiment using steelman prompting to evaluate bias in six major LLMs, focusing on their interpretations of 1 Corinthians 6–7 and implications for Christian sexual ethics.
Reddit - Artificial Intelligence · 27 days ago -
4
Anthropic Hits Back After US Military Labels It a 'Supply Chain Risk' | WIRED
Anthropic responds to the Pentagon's designation of its AI technology as a 'supply chain risk,' arguing it would be legally unsound and could set a dangerous precedent for American companies.
Wired - AI · 27 days ago -
5
Anthropic should move to Europe
The article discusses the potential benefits of relocating Anthropic, an AI company, to Europe as a means of providing a safer operational environment away from perceived U.S. government pressures.
Reddit - Artificial Intelligence · 27 days ago -
6
Who's really running AI? Inside the billion-dollar battle over regulation with Alex Bores | TechCrunch
Alex Bores discusses the RAISE Act and the influence of super PACs on AI regulation in the U.S. during his appearance on TechCrunch's Equity podcast.
TechCrunch - AI · 28 days ago -
7
[2510.18299] Physics-Informed Parametric Bandits for Beam Alignment in mmWave Communications
Abstract page for arXiv paper 2510.18299: Physics-Informed Parametric Bandits for Beam Alignment in mmWave Communications
arXiv - Machine Learning · 24 days ago -
8
[2503.11832] Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
Abstract page for arXiv paper 2503.11832: Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
arXiv - Machine Learning · 24 days ago -
9
Anthropic refuses Pentagon’s new terms, standing firm on lethal autonomous weapons and mass surveillance | The Verge
Anthropic has rejected the Pentagon's ultimatum for unrestricted access to its AI, maintaining its stance against lethal autonomous weapons and mass surveillance.
The Verge - AI · 29 days ago -
10
Good on Anthropic for declining the Pentagon deal
The article discusses Anthropic's decision to decline a deal with the Pentagon, highlighting concerns over user security and ethical implications in AI development.
Reddit - Artificial Intelligence · 28 days ago -
11
OpenAI Fires an Employee for Prediction Market Insider Trading | WIRED
OpenAI has terminated an employee for insider trading on prediction markets, raising concerns about the ethical implications of using confidential information for personal gain.
Wired - AI · 28 days ago -
12
Will AI accelerate or undermine the way humans have always innovated?
The article explores how technological innovation has historically relied on collaboration and expertise, contrasting it with individual learning limitations, and discusses the potential impact of ...
AI News - General · 27 days ago -
13
[2510.26905] Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations
Abstract page for arXiv paper 2510.26905: Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations
arXiv - AI · 22 days ago -
14
Anthropic vs. the Pentagon: What’s actually at stake? | TechCrunch
The article discusses the conflict between Anthropic and the Pentagon over the use of AI in military applications, focusing on ethical concerns surrounding autonomous weapons and surveillance.
TechCrunch - AI · 28 days ago -
15
Musk bashes OpenAI in deposition, saying 'nobody committed suicide because of Grok' | TechCrunch
Elon Musk criticizes OpenAI's safety record in a deposition for his lawsuit against the company, claiming his AI venture, xAI, prioritizes safety over profit.
TechCrunch - AI · 28 days ago -
16
[2510.10932] DropVLA: An Action-Level Backdoor Attack on Vision--Language--Action Models
The paper presents DropVLA, an action-level backdoor attack on Vision-Language-Action models, demonstrating how minimal data poisoning can induce targeted actions without degrading nominal performa...
arXiv - AI · 28 days ago -
17
Societal level AI Tragedy of the Commons. Someone please prove me wrong.
The article discusses concerns about AI-induced layoffs of white-collar workers, emphasizing the potential economic impact due to reduced consumer spending.
Reddit - Artificial Intelligence · 28 days ago -
18
[2602.22546] Requesting Expert Reasoning: Augmenting LLM Agents with Learned Collaborative Intervention
This article presents a framework called AHCE for enhancing Large Language Model (LLM) agents through effective human collaboration, significantly improving task success rates in specialized domains.
arXiv - AI · 28 days ago -
19
[2602.22758] Decomposing Physician Disagreement in HealthBench
This paper analyzes physician disagreement in the HealthBench dataset, identifying key factors contributing to variance in evaluations and suggesting improvements for medical AI assessments.
arXiv - AI · 28 days ago -
20
[2602.23232] ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays
The paper presents ReCoN-Ipsundrum, an inspectable AI agent that integrates affect-coupled control with a recurrent persistence loop, exploring its implications for machine consciousness and behavior.
arXiv - AI · 28 days ago -
21
[2602.23329] LLM Novice Uplift on Dual-Use, In Silico Biology Tasks
This article examines the effectiveness of large language models (LLMs) in enhancing novice users' performance on complex biological tasks, revealing significant accuracy improvements over traditio...
arXiv - AI · 28 days ago -
22
[2602.23164] MetaOthello: A Controlled Study of Multiple World Models in Transformers
The paper presents MetaOthello, a study exploring how transformers manage multiple world models through a controlled suite of Othello variants, revealing insights into shared representation and mod...
arXiv - Machine Learning · 28 days ago -
23
[2602.22631] TorchLean: Formalizing Neural Networks in Lean
TorchLean is a framework that formalizes neural networks within the Lean 4 theorem prover, enabling precise semantics for execution and verification, addressing critical safety in AI applications.
arXiv - Machine Learning · 28 days ago -
24
[2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation
The paper discusses the evaluation challenges in text-to-image generation, focusing on classifier-free guidance (CFG) and proposing a new evaluation framework to address biases in current methods.
arXiv - AI · 28 days ago -
25
[2602.22903] PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA
The paper presents PSQE, a method for enhancing pseudo seed quality in unsupervised multimodal entity alignment, addressing challenges in data integration for large language models.
arXiv - Machine Learning · 28 days ago -
26
[2602.22700] IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation
The paper presents IMMACULATE, a framework for auditing large language models (LLMs) using verifiable computation to detect economic deviations without needing trusted hardware.
arXiv - AI · 28 days ago -
27
[2602.22710] Same Words, Different Judgments: Modality Effects on Preference Alignment
This study explores how modality affects preference alignment in AI systems, comparing human and synthetic evaluations of audio and text content. It finds that audio ratings are reliable but exhibi...
arXiv - AI · 28 days ago -
28
[2602.22724] AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification
AgentSentry introduces a novel framework to mitigate indirect prompt injection (IPI) in LLM agents, enhancing their security while maintaining task performance.
arXiv - AI · 28 days ago -
29
[2602.22740] AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
The paper presents AMLRIS, a novel training strategy for Referring Image Segmentation (RIS) that enhances object segmentation through alignment-aware masked learning, achieving state-of-the-art res...
arXiv - AI · 28 days ago -
30
UK cops suspend live facial recog as study finds racial bias
submitted by /u/ateam1984 [link] [comments]
Reddit - Artificial Intelligence · 4 days ago
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime