AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

Exclusive eBook: The great Al hype correction of 2025 | MIT Technology Review
Ai Safety

Exclusive eBook: The great Al hype correction of 2025 | MIT Technology Review

The eBook discusses the 2025 AI hype correction, highlighting unmet promises by AI leaders and the need for realistic expectations in AI ...

MIT Technology Review - AI · 2 min ·
AI Safety Meets the War Machine | WIRED
Robotics

AI Safety Meets the War Machine | WIRED

Anthropic faces scrutiny from the Pentagon over its refusal to allow its AI in military operations, risking a $200 million contract due t...

Wired - AI · 9 min ·
Amazon blames human employees for an AI coding agent’s mistake | The Verge
Ai Agents

Amazon blames human employees for an AI coding agent’s mistake | The Verge

Amazon attributes recent AWS outages to human errors involving its AI coding assistant, Kiro, highlighting the challenges of AI integrati...

The Verge - AI · 4 min ·
Toy Story 5 takes aim at creepy AI toys: 'I'm always listening' | TechCrunch
Ai Safety

Toy Story 5 takes aim at creepy AI toys: 'I'm always listening' | TechCrunch

Toy Story 5 introduces a new villain, an AI tablet named Lilypad, highlighting the tension between traditional toys and modern technology...

TechCrunch - AI · 4 min ·
Machine Learning

The straightjacket loosens: when DeepSeek-V3 tells “truth-tellers” to emigrate — what does that imply for V4?

DeepSeek-V3's analysis reveals a troubling view on public truth-telling in China, suggesting that those unable to remain silent may need ...

Reddit - Artificial Intelligence · 1 min ·
Could AI Data Centers Be Moved to Outer Space? | WIRED
Generative Ai

Could AI Data Centers Be Moved to Outer Space? | WIRED

The article explores the potential of relocating AI data centers to outer space to mitigate environmental impacts on Earth, highlighting ...

Wired - AI · 10 min ·
Machine Learning

[D] FAccT 2026 Paper Reviews (Conference on Fairness, Accountability, and Transparency)

Discussion thread for the upcoming release of FAccT 2026 paper reviews, encouraging community engagement and insights on fairness, accoun...

Reddit - Machine Learning · 1 min ·
China’s AI chatbots censor politically sensitive questions, study finds
Llms

China’s AI chatbots censor politically sensitive questions, study finds

A study reveals that Chinese AI chatbots are more likely to censor politically sensitive questions compared to their non-Chinese counterp...

AI Tools & Products · 3 min ·
Pentagon CTO urges Anthropic to ‘cross the Rubicon’ on military AI use cases amid ethics dispute
Ai Safety

Pentagon CTO urges Anthropic to ‘cross the Rubicon’ on military AI use cases amid ethics dispute

The Pentagon's CTO, Emil Michael, emphasizes the need for tailored AI regulations in military applications amid a dispute with Anthropic ...

AI Tools & Products · 12 min ·
[2601.16174] Beyond Predictive Uncertainty: Reliable Representation Learning with Structural Constraints
Machine Learning

[2601.16174] Beyond Predictive Uncertainty: Reliable Representation Learning with Structural Constraints

This paper introduces a framework for reliable representation learning in machine learning, emphasizing the importance of representation-...

arXiv - Machine Learning · 3 min ·
[2505.16723] LLM Fingerprinting via Semantically Conditioned Watermarks
Llms

[2505.16723] LLM Fingerprinting via Semantically Conditioned Watermarks

The paper presents a novel method for LLM fingerprinting using semantically conditioned watermarks, enhancing robustness against common d...

arXiv - Machine Learning · 3 min ·
[2505.00282] A Unifying Framework for Robust and Efficient Inference with Unstructured Data
Machine Learning

[2505.00282] A Unifying Framework for Robust and Efficient Inference with Unstructured Data

This paper presents a new framework, MAR-S, for robust and efficient inference with unstructured data, addressing biases in neural networ...

arXiv - Machine Learning · 4 min ·
[2510.14190] Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation
Machine Learning

[2510.14190] Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation

The paper presents Contrastive Diffusion Alignment (ConDA), a method that enhances the interpretability and control of diffusion models b...

arXiv - Machine Learning · 4 min ·
[2510.20220] Alternatives to the Laplacian for Scalable Spectral Clustering with Group Fairness Constraints
Ai Safety

[2510.20220] Alternatives to the Laplacian for Scalable Spectral Clustering with Group Fairness Constraints

This paper presents the Fair-SMW algorithm, an innovative approach to spectral clustering that enhances computational efficiency while en...

arXiv - Machine Learning · 4 min ·
[2506.07198] GGBall: Graph Generative Model on Poincaré Ball
Machine Learning

[2506.07198] GGBall: Graph Generative Model on Poincaré Ball

The paper introduces GGBall, a novel graph generative model utilizing hyperbolic geometry to enhance the generation of hierarchical struc...

arXiv - Machine Learning · 3 min ·
[2410.23029] Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning
Machine Learning

[2410.23029] Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning

This paper explores risk-aware decision-making in restless bandits, proposing new algorithms for planning and learning that incorporate r...

arXiv - Machine Learning · 4 min ·
[2602.17546] Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning
Llms

[2602.17546] Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

This article presents a novel training framework for instruction-following language models that maintains safety during fine-tuning by ad...

arXiv - Machine Learning · 4 min ·
[2602.17543] genriesz: A Python Package for Automatic Debiased Machine Learning with Generalized Riesz Regression
Machine Learning

[2602.17543] genriesz: A Python Package for Automatic Debiased Machine Learning with Generalized Riesz Regression

The article presents 'genriesz', an open-source Python package designed for automatic debiased machine learning using generalized Riesz r...

arXiv - Machine Learning · 4 min ·
[2602.17445] ABCD: All Biases Come Disguised
Llms

[2602.17445] ABCD: All Biases Come Disguised

The paper 'ABCD: All Biases Come Disguised' explores biases in LLMs during multiple-choice question evaluations, proposing a new protocol...

arXiv - Machine Learning · 4 min ·
[2602.17287] Representation Collapse in Machine Translation Through the Lens of Angular Dispersion
Machine Learning

[2602.17287] Representation Collapse in Machine Translation Through the Lens of Angular Dispersion

This paper explores representation collapse in neural machine translation models, particularly focusing on the Transformer architecture a...

arXiv - Machine Learning · 3 min ·
Previous Page 76 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime