AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · 2 days ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · 2 days ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · 2 days ago

All Content

Ai Safety

Exclusive eBook: The great Al hype correction of 2025 | MIT Technology Review

The eBook discusses the 2025 AI hype correction, highlighting unmet promises by AI leaders and the need for realistic expectations in AI ...

MIT Technology Review - AI · 2 min · about 1 month ago

Robotics

AI Safety Meets the War Machine | WIRED

Anthropic faces scrutiny from the Pentagon over its refusal to allow its AI in military operations, risking a $200 million contract due t...

Wired - AI · 9 min · about 1 month ago

Ai Agents

Amazon blames human employees for an AI coding agent’s mistake | The Verge

Amazon attributes recent AWS outages to human errors involving its AI coding assistant, Kiro, highlighting the challenges of AI integrati...

The Verge - AI · 4 min · about 1 month ago

Ai Safety

Toy Story 5 takes aim at creepy AI toys: 'I'm always listening' | TechCrunch

Toy Story 5 introduces a new villain, an AI tablet named Lilypad, highlighting the tension between traditional toys and modern technology...

TechCrunch - AI · 4 min · about 1 month ago

Machine Learning

The straightjacket loosens: when DeepSeek-V3 tells “truth-tellers” to emigrate — what does that imply for V4?

DeepSeek-V3's analysis reveals a troubling view on public truth-telling in China, suggesting that those unable to remain silent may need ...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Generative Ai

Could AI Data Centers Be Moved to Outer Space? | WIRED

The article explores the potential of relocating AI data centers to outer space to mitigate environmental impacts on Earth, highlighting ...

Wired - AI · 10 min · about 1 month ago

Machine Learning

[D] FAccT 2026 Paper Reviews (Conference on Fairness, Accountability, and Transparency)

Discussion thread for the upcoming release of FAccT 2026 paper reviews, encouraging community engagement and insights on fairness, accoun...

Reddit - Machine Learning · 1 min · about 1 month ago

Llms

China’s AI chatbots censor politically sensitive questions, study finds

A study reveals that Chinese AI chatbots are more likely to censor politically sensitive questions compared to their non-Chinese counterp...

AI Tools & Products · 3 min · about 1 month ago

Ai Safety

Pentagon CTO urges Anthropic to ‘cross the Rubicon’ on military AI use cases amid ethics dispute

The Pentagon's CTO, Emil Michael, emphasizes the need for tailored AI regulations in military applications amid a dispute with Anthropic ...

AI Tools & Products · 12 min · about 1 month ago

Machine Learning

[2601.16174] Beyond Predictive Uncertainty: Reliable Representation Learning with Structural Constraints

This paper introduces a framework for reliable representation learning in machine learning, emphasizing the importance of representation-...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2505.16723] LLM Fingerprinting via Semantically Conditioned Watermarks

The paper presents a novel method for LLM fingerprinting using semantically conditioned watermarks, enhancing robustness against common d...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2505.00282] A Unifying Framework for Robust and Efficient Inference with Unstructured Data

This paper presents a new framework, MAR-S, for robust and efficient inference with unstructured data, addressing biases in neural networ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2510.14190] Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation

The paper presents Contrastive Diffusion Alignment (ConDA), a method that enhances the interpretability and control of diffusion models b...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

[2510.20220] Alternatives to the Laplacian for Scalable Spectral Clustering with Group Fairness Constraints

This paper presents the Fair-SMW algorithm, an innovative approach to spectral clustering that enhances computational efficiency while en...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2506.07198] GGBall: Graph Generative Model on Poincaré Ball

The paper introduces GGBall, a novel graph generative model utilizing hyperbolic geometry to enhance the generation of hierarchical struc...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2410.23029] Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning

This paper explores risk-aware decision-making in restless bandits, proposing new algorithms for planning and learning that incorporate r...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.17546] Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

This article presents a novel training framework for instruction-following language models that maintains safety during fine-tuning by ad...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.17543] genriesz: A Python Package for Automatic Debiased Machine Learning with Generalized Riesz Regression

The article presents 'genriesz', an open-source Python package designed for automatic debiased machine learning using generalized Riesz r...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.17445] ABCD: All Biases Come Disguised

The paper 'ABCD: All Biases Come Disguised' explores biases in LLMs during multiple-choice question evaluations, proposing a new protocol...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.17287] Representation Collapse in Machine Translation Through the Lens of Angular Dispersion

This paper explores representation collapse in neural machine translation models, particularly focusing on the Transformer architecture a...

arXiv - Machine Learning · 3 min · about 1 month ago

Previous Page 76 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

All Content

Exclusive eBook: The great Al hype correction of 2025 | MIT Technology Review

AI Safety Meets the War Machine | WIRED

Amazon blames human employees for an AI coding agent’s mistake | The Verge

Toy Story 5 takes aim at creepy AI toys: 'I'm always listening' | TechCrunch

The straightjacket loosens: when DeepSeek-V3 tells “truth-tellers” to emigrate — what does that imply for V4?

Could AI Data Centers Be Moved to Outer Space? | WIRED

[D] FAccT 2026 Paper Reviews (Conference on Fairness, Accountability, and Transparency)

China’s AI chatbots censor politically sensitive questions, study finds

Pentagon CTO urges Anthropic to ‘cross the Rubicon’ on military AI use cases amid ethics dispute

[2601.16174] Beyond Predictive Uncertainty: Reliable Representation Learning with Structural Constraints

[2505.16723] LLM Fingerprinting via Semantically Conditioned Watermarks

[2505.00282] A Unifying Framework for Robust and Efficient Inference with Unstructured Data

[2510.14190] Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation

[2510.20220] Alternatives to the Laplacian for Scalable Spectral Clustering with Group Fairness Constraints

[2506.07198] GGBall: Graph Generative Model on Poincaré Ball

[2410.23029] Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning

[2602.17546] Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

[2602.17543] genriesz: A Python Package for Automatic Debiased Machine Learning with Generalized Riesz Regression

[2602.17445] ABCD: All Biases Come Disguised

[2602.17287] Representation Collapse in Machine Translation Through the Lens of Angular Dispersion

Related Topics

Stay updated with AI News