AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.17846] Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models
Machine Learning

[2602.17846] Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models

This paper explores the memorization phenomena in diffusion models, introducing a geometric framework that identifies risk levels across ...

arXiv - Machine Learning · 4 min ·
[2602.17783] Multi-material Multi-physics Topology Optimization with Physics-informed Gaussian Process Priors
Machine Learning

[2602.17783] Multi-material Multi-physics Topology Optimization with Physics-informed Gaussian Process Priors

This paper presents a novel framework for multi-material, multi-physics topology optimization using physics-informed Gaussian processes, ...

arXiv - Machine Learning · 4 min ·
[2602.17743] Provable Adversarial Robustness in In-Context Learning
Llms

[2602.17743] Provable Adversarial Robustness in In-Context Learning

This paper presents a framework for ensuring adversarial robustness in in-context learning (ICL) for large language models, addressing th...

arXiv - Machine Learning · 3 min ·
[2602.17699] Certified Learning under Distribution Shift: Sound Verification and Identifiable Structure
Machine Learning

[2602.17699] Certified Learning under Distribution Shift: Sound Verification and Identifiable Structure

This paper presents a framework for certified learning under distribution shifts, focusing on sound verification and identifiable structu...

arXiv - Machine Learning · 3 min ·
[2602.17696] Can LLM Safety Be Ensured by Constraining Parameter Regions?
Llms

[2602.17696] Can LLM Safety Be Ensured by Constraining Parameter Regions?

This article explores the effectiveness of identifying 'safety regions' in large language models (LLMs) by evaluating various methods acr...

arXiv - Machine Learning · 3 min ·
[2602.17695] EXACT: Explicit Attribute-Guided Decoding-Time Personalization
Llms

[2602.17695] EXACT: Explicit Attribute-Guided Decoding-Time Personalization

The paper presents EXACT, a novel approach for decoding-time personalization in large language models, enhancing user alignment through i...

arXiv - Machine Learning · 3 min ·
[2602.17692] Agentic Unlearning: When LLM Agent Meets Machine Unlearning
Llms

[2602.17692] Agentic Unlearning: When LLM Agent Meets Machine Unlearning

The paper introduces 'agentic unlearning,' a novel approach to remove sensitive information from both model parameters and memory in AI a...

arXiv - Machine Learning · 3 min ·
[2602.17677] Reducing Text Bias in Synthetically Generated MCQAs for VLMs in Autonomous Driving
Llms

[2602.17677] Reducing Text Bias in Synthetically Generated MCQAs for VLMs in Autonomous Driving

This paper discusses reducing text bias in synthetically generated multiple-choice question answering (MCQA) for Vision Language Models (...

arXiv - Machine Learning · 3 min ·
Cyber judgment day? Anthropic’s new AI tool rattles sector, sparks shake-up fears
Ai Safety

Cyber judgment day? Anthropic’s new AI tool rattles sector, sparks shake-up fears

Anthropic's new AI tool, Claude Code Security, identifies hidden software vulnerabilities, causing significant market shifts in the cyber...

AI Tools & Products · 4 min ·
‘AI injury attorneys’ sue ChatGPT in another AI psychosis case
Llms

‘AI injury attorneys’ sue ChatGPT in another AI psychosis case

A lawsuit has been filed against OpenAI by AI injury attorneys, claiming that ChatGPT caused severe mental health issues, including psych...

AI Tools & Products · 5 min ·
Cities Are Shredding Their AI Surveillance Contracts en Masse
Ai Safety

Cities Are Shredding Their AI Surveillance Contracts en Masse

Over 30 cities have terminated contracts with Flock Safety, an AI surveillance company, amid rising concerns over privacy and federal ove...

AI Tools & Products · 2 min ·
Should I worry about how much water my AI chatbot conversations are using?
Ai Infrastructure

Should I worry about how much water my AI chatbot conversations are using?

The article explores the environmental impact of AI chatbots, focusing on their water consumption during operation. It presents varying e...

AI Tools & Products · 7 min ·
Robotics

[P]: Engineering a Deterministic Kill-Switch for Autonomous Agents

The article discusses the engineering of a deterministic kill-switch for autonomous agents, emphasizing the importance of safety mechanis...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] Requesting cs.LG arXiv endorsement. Mechanistic interpretability paper on residual update trajectory geometry (draft available)

The author seeks endorsement for their arXiv paper on mechanistic interpretability, focusing on the geometric structure of residual updat...

Reddit - Machine Learning · 1 min ·
EU AI Act: first regulation on artificial intelligence
Ai Safety

EU AI Act: first regulation on artificial intelligence

The EU AI Act establishes the world's first comprehensive framework for regulating artificial intelligence, focusing on safety, transpare...

AI News - General · 6 min ·
India’s path to AI autonomy
Ai Safety

India’s path to AI autonomy

India is pursuing AI autonomy through a unique three-pillar strategy focused on democratizing AI, public-sector applications, and global ...

AI News - General · 13 min ·
Explained: Generative AI’s environmental impact
Generative Ai

Explained: Generative AI’s environmental impact

The article discusses the environmental impact of generative AI, highlighting its significant electricity and water consumption, and the ...

AI News - General · 12 min ·
Ai Safety

Ai ?

The Reddit discussion explores concerns about AI potentially replacing jobs in the future, prompting varied opinions on the impact of AI ...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

TAME

The article discusses the ethical considerations of AI in healthcare, emphasizing the need for responsible implementation to meet patient...

Reddit - Artificial Intelligence · 1 min ·
Ai Agents

This Defense Company Made AI Agents That Blow Things Up

The article discusses a defense company's development of AI agents designed for military applications, raising ethical concerns about aut...

Reddit - Artificial Intelligence · 1 min ·
Previous Page 74 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime