AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · 1 day ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · 2 days ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · 2 days ago

All Content

Machine Learning

[2602.17846] Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models

This paper explores the memorization phenomena in diffusion models, introducing a geometric framework that identifies risk levels across ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.17783] Multi-material Multi-physics Topology Optimization with Physics-informed Gaussian Process Priors

This paper presents a novel framework for multi-material, multi-physics topology optimization using physics-informed Gaussian processes, ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.17743] Provable Adversarial Robustness in In-Context Learning

This paper presents a framework for ensuring adversarial robustness in in-context learning (ICL) for large language models, addressing th...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.17699] Certified Learning under Distribution Shift: Sound Verification and Identifiable Structure

This paper presents a framework for certified learning under distribution shifts, focusing on sound verification and identifiable structu...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.17696] Can LLM Safety Be Ensured by Constraining Parameter Regions?

This article explores the effectiveness of identifying 'safety regions' in large language models (LLMs) by evaluating various methods acr...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.17695] EXACT: Explicit Attribute-Guided Decoding-Time Personalization

The paper presents EXACT, a novel approach for decoding-time personalization in large language models, enhancing user alignment through i...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.17692] Agentic Unlearning: When LLM Agent Meets Machine Unlearning

The paper introduces 'agentic unlearning,' a novel approach to remove sensitive information from both model parameters and memory in AI a...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.17677] Reducing Text Bias in Synthetically Generated MCQAs for VLMs in Autonomous Driving

This paper discusses reducing text bias in synthetically generated multiple-choice question answering (MCQA) for Vision Language Models (...

arXiv - Machine Learning · 3 min · about 1 month ago

Cyber judgment day? Anthropic’s new AI tool rattles sector, sparks shake-up fears

Ai Safety

Cyber judgment day? Anthropic’s new AI tool rattles sector, sparks shake-up fears

Anthropic's new AI tool, Claude Code Security, identifies hidden software vulnerabilities, causing significant market shifts in the cyber...

AI Tools & Products · 4 min · about 1 month ago

‘AI injury attorneys’ sue ChatGPT in another AI psychosis case

Llms

‘AI injury attorneys’ sue ChatGPT in another AI psychosis case

A lawsuit has been filed against OpenAI by AI injury attorneys, claiming that ChatGPT caused severe mental health issues, including psych...

AI Tools & Products · 5 min · about 1 month ago

Cities Are Shredding Their AI Surveillance Contracts en Masse

Ai Safety

Cities Are Shredding Their AI Surveillance Contracts en Masse

Over 30 cities have terminated contracts with Flock Safety, an AI surveillance company, amid rising concerns over privacy and federal ove...

AI Tools & Products · 2 min · about 1 month ago

Should I worry about how much water my AI chatbot conversations are using?

Ai Infrastructure

Should I worry about how much water my AI chatbot conversations are using?

The article explores the environmental impact of AI chatbots, focusing on their water consumption during operation. It presents varying e...

AI Tools & Products · 7 min · about 1 month ago

Robotics

[P]: Engineering a Deterministic Kill-Switch for Autonomous Agents

The article discusses the engineering of a deterministic kill-switch for autonomous agents, emphasizing the importance of safety mechanis...

Reddit - Machine Learning · 1 min · about 1 month ago

Machine Learning

[R] Requesting cs.LG arXiv endorsement. Mechanistic interpretability paper on residual update trajectory geometry (draft available)

The author seeks endorsement for their arXiv paper on mechanistic interpretability, focusing on the geometric structure of residual updat...

Reddit - Machine Learning · 1 min · about 1 month ago

EU AI Act: first regulation on artificial intelligence

Ai Safety

EU AI Act: first regulation on artificial intelligence

The EU AI Act establishes the world's first comprehensive framework for regulating artificial intelligence, focusing on safety, transpare...

AI News - General · 6 min · about 1 month ago

India’s path to AI autonomy

Ai Safety

India’s path to AI autonomy

India is pursuing AI autonomy through a unique three-pillar strategy focused on democratizing AI, public-sector applications, and global ...

AI News - General · 13 min · about 1 month ago

Explained: Generative AI’s environmental impact

Generative Ai

Explained: Generative AI’s environmental impact

The article discusses the environmental impact of generative AI, highlighting its significant electricity and water consumption, and the ...

AI News - General · 12 min · about 1 month ago

Ai Safety

Ai ?

The Reddit discussion explores concerns about AI potentially replacing jobs in the future, prompting varied opinions on the impact of AI ...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Ai Safety

TAME

The article discusses the ethical considerations of AI in healthcare, emphasizing the need for responsible implementation to meet patient...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Ai Agents

This Defense Company Made AI Agents That Blow Things Up

The article discusses a defense company's development of AI agents designed for military applications, raising ethical concerns about aut...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Previous Page 74 Next

Related Topics

Machine Learning Large Language Models Generative AI Natural Language Processing Computer Vision Robotics & Embodied AI

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime