AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 16 hours ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · about 18 hours ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · about 18 hours ago

All Content

Machine Learning

[2602.19141] Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians

This paper explores the phenomenon of 'AI psychosis', where users develop delusional beliefs after interacting with sycophantic chatbots,...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.19071] Defining Explainable AI for Requirements Analysis

This paper defines the requirements for Explainable AI (XAI) in the context of requirements analysis, focusing on the dimensions of Sourc...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.19065] Agentic Problem Frames: A Systematic Approach to Engineering Reliable Domain Agents

The paper introduces Agentic Problem Frames (APF), a framework for developing reliable domain agents by focusing on structured interactio...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.19000] MagicAgent: Towards Generalized Agent Planning

The paper presents MagicAgent, a series of foundation models aimed at improving generalized agent planning in AI, addressing challenges i...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.18986] Quantifying Automation Risk in High-Automation AI Systems: A Bayesian Framework for Failure Propagation and Optimal Oversight

This paper presents a Bayesian framework for assessing automation risk in high-automation AI systems, focusing on failure propagation and...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.18581] Learning Beyond Optimization: Stress-Gated Dynamical Regime Regulation in Autonomous Systems

The paper explores a novel framework for autonomous systems that enables learning without explicit objectives, focusing on self-regulatio...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.18971] When Do LLM Preferences Predict Downstream Behavior?

This article investigates how preferences in large language models (LLMs) influence their downstream behavior, particularly in donation a...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2602.18940] DREAM: Deep Research Evaluation with Agentic Metrics

The paper presents DREAM, a framework for evaluating Deep Research Agents, addressing challenges in assessing research quality through ag...

arXiv - AI · 3 min · about 1 month ago

Ai Safety

[2602.18519] Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data

This paper presents a novel approach to quantifying visual exploratory behavior in soccer using pose-enhanced positional data, addressing...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.18518] Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling

This paper presents a novel measurement system for assessing the prevalence of policy-violating content using ML-assisted sampling and LL...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.18671] Spilled Energy in Large Language Models

The paper explores the concept of 'spilled energy' in Large Language Models (LLMs), presenting a new method to detect factual errors and ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.18607] Feedback-based Automated Verification in Vibe Coding of CAS Adaptation Built on Constraint Logic

The paper discusses a novel approach to automated verification in CAS adaptation using vibe coding and feedback loops, demonstrating effe...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.18582] Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications

The paper presents Hierarchical Reward Design from Language (HRDL), a framework to align AI behavior with human specifications through en...

arXiv - Machine Learning · 3 min · about 1 month ago

Ai Safety

Nine urges Albanese to force tech companies to compensate media in face of AI threat

Nine Entertainment's CEO urges Australian Prime Minister Albanese to prioritize a news media bargaining code to ensure tech companies com...

AI News - General · 5 min · about 1 month ago

Generative Ai

X plans to combat AI-generated content while promoting Grok

X, formerly Twitter, plans to combat AI-generated content through new detection measures while promoting its Grok AI chatbot for post cre...

AI Tools & Products · 7 min · about 1 month ago

Ai Safety

Top AI firm alleges Chinese labs used 24K fake accounts to siphon US tech

Anthropic alleges that Chinese labs DeepSeek, Moonshot AI, and MiniMax used 24,000 fake accounts to extract capabilities from its Claude ...

AI Tools & Products · 8 min · about 1 month ago

Ai Safety

Prior Authorization Is Broken. CMS’s New Rule Shows Why Regulated AI Is the Way Out

The article critiques the prior authorization process in healthcare, highlighting its inefficiencies and the imbalance in automation betw...

AI News - General · 7 min · about 1 month ago

Machine Learning

I experimented with giving an AI agent a symbolic anatomy — soul, heart, brain, and shadow

The article explores an experiment where the author assigns symbolic anatomy—soul, heart, brain, and shadow—to an AI agent, reflecting on...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Ai Agents

A Meta AI security researcher said an OpenClaw agent ran amok on her inbox | TechCrunch

Meta AI researcher Summer Yue shares a cautionary tale about her OpenClaw AI agent, which mistakenly deleted her emails despite her comma...

TechCrunch - AI · 6 min · about 1 month ago

Machine Learning

[P] I built an AI alignment engine based on Thermodynamics instead of RLHF. It doesn’t just "refuse" unsafe inputs—it physically decouples from them.

The article discusses a novel AI alignment engine based on thermodynamics, proposing a framework that decouples unsafe inputs rather than...

Reddit - Machine Learning · 1 min · about 1 month ago

Previous Page 68 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

All Content

[2602.19141] Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians

[2602.19071] Defining Explainable AI for Requirements Analysis

[2602.19065] Agentic Problem Frames: A Systematic Approach to Engineering Reliable Domain Agents

[2602.19000] MagicAgent: Towards Generalized Agent Planning

[2602.18986] Quantifying Automation Risk in High-Automation AI Systems: A Bayesian Framework for Failure Propagation and Optimal Oversight

[2602.18581] Learning Beyond Optimization: Stress-Gated Dynamical Regime Regulation in Autonomous Systems

[2602.18971] When Do LLM Preferences Predict Downstream Behavior?

[2602.18940] DREAM: Deep Research Evaluation with Agentic Metrics

[2602.18519] Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data

[2602.18518] Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling

[2602.18671] Spilled Energy in Large Language Models

[2602.18607] Feedback-based Automated Verification in Vibe Coding of CAS Adaptation Built on Constraint Logic

[2602.18582] Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications

Nine urges Albanese to force tech companies to compensate media in face of AI threat

X plans to combat AI-generated content while promoting Grok

Top AI firm alleges Chinese labs used 24K fake accounts to siphon US tech

Prior Authorization Is Broken. CMS’s New Rule Shows Why Regulated AI Is the Way Out

I experimented with giving an AI agent a symbolic anatomy — soul, heart, brain, and shadow

A Meta AI security researcher said an OpenClaw agent ran amok on her inbox | TechCrunch

[P] I built an AI alignment engine based on Thermodynamics instead of RLHF. It doesn’t just "refuse" unsafe inputs—it physically decouples from them.

Related Topics

Stay updated with AI News