AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.19141] Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians
Machine Learning

[2602.19141] Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians

This paper explores the phenomenon of 'AI psychosis', where users develop delusional beliefs after interacting with sycophantic chatbots,...

arXiv - AI · 3 min ·
[2602.19071] Defining Explainable AI for Requirements Analysis
Machine Learning

[2602.19071] Defining Explainable AI for Requirements Analysis

This paper defines the requirements for Explainable AI (XAI) in the context of requirements analysis, focusing on the dimensions of Sourc...

arXiv - AI · 3 min ·
[2602.19065] Agentic Problem Frames: A Systematic Approach to Engineering Reliable Domain Agents
Llms

[2602.19065] Agentic Problem Frames: A Systematic Approach to Engineering Reliable Domain Agents

The paper introduces Agentic Problem Frames (APF), a framework for developing reliable domain agents by focusing on structured interactio...

arXiv - AI · 4 min ·
[2602.19000] MagicAgent: Towards Generalized Agent Planning
Llms

[2602.19000] MagicAgent: Towards Generalized Agent Planning

The paper presents MagicAgent, a series of foundation models aimed at improving generalized agent planning in AI, addressing challenges i...

arXiv - AI · 4 min ·
[2602.18986] Quantifying Automation Risk in High-Automation AI Systems: A Bayesian Framework for Failure Propagation and Optimal Oversight
Ai Safety

[2602.18986] Quantifying Automation Risk in High-Automation AI Systems: A Bayesian Framework for Failure Propagation and Optimal Oversight

This paper presents a Bayesian framework for assessing automation risk in high-automation AI systems, focusing on failure propagation and...

arXiv - AI · 4 min ·
[2602.18581] Learning Beyond Optimization: Stress-Gated Dynamical Regime Regulation in Autonomous Systems
Machine Learning

[2602.18581] Learning Beyond Optimization: Stress-Gated Dynamical Regime Regulation in Autonomous Systems

The paper explores a novel framework for autonomous systems that enables learning without explicit objectives, focusing on self-regulatio...

arXiv - Machine Learning · 4 min ·
[2602.18971] When Do LLM Preferences Predict Downstream Behavior?
Llms

[2602.18971] When Do LLM Preferences Predict Downstream Behavior?

This article investigates how preferences in large language models (LLMs) influence their downstream behavior, particularly in donation a...

arXiv - AI · 4 min ·
[2602.18940] DREAM: Deep Research Evaluation with Agentic Metrics
Nlp

[2602.18940] DREAM: Deep Research Evaluation with Agentic Metrics

The paper presents DREAM, a framework for evaluating Deep Research Agents, addressing challenges in assessing research quality through ag...

arXiv - AI · 3 min ·
[2602.18519] Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data
Ai Safety

[2602.18519] Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data

This paper presents a novel approach to quantifying visual exploratory behavior in soccer using pose-enhanced positional data, addressing...

arXiv - Machine Learning · 4 min ·
[2602.18518] Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling
Llms

[2602.18518] Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling

This paper presents a novel measurement system for assessing the prevalence of policy-violating content using ML-assisted sampling and LL...

arXiv - Machine Learning · 4 min ·
[2602.18671] Spilled Energy in Large Language Models
Llms

[2602.18671] Spilled Energy in Large Language Models

The paper explores the concept of 'spilled energy' in Large Language Models (LLMs), presenting a new method to detect factual errors and ...

arXiv - AI · 3 min ·
[2602.18607] Feedback-based Automated Verification in Vibe Coding of CAS Adaptation Built on Constraint Logic
Llms

[2602.18607] Feedback-based Automated Verification in Vibe Coding of CAS Adaptation Built on Constraint Logic

The paper discusses a novel approach to automated verification in CAS adaptation using vibe coding and feedback loops, demonstrating effe...

arXiv - AI · 4 min ·
[2602.18582] Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications
Machine Learning

[2602.18582] Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications

The paper presents Hierarchical Reward Design from Language (HRDL), a framework to align AI behavior with human specifications through en...

arXiv - Machine Learning · 3 min ·
Nine urges Albanese to force tech companies to compensate media in face of AI threat
Ai Safety

Nine urges Albanese to force tech companies to compensate media in face of AI threat

Nine Entertainment's CEO urges Australian Prime Minister Albanese to prioritize a news media bargaining code to ensure tech companies com...

AI News - General · 5 min ·
X plans to combat AI-generated content while promoting Grok
Generative Ai

X plans to combat AI-generated content while promoting Grok

X, formerly Twitter, plans to combat AI-generated content through new detection measures while promoting its Grok AI chatbot for post cre...

AI Tools & Products · 7 min ·
Top AI firm alleges Chinese labs used 24K fake accounts to siphon US tech
Ai Safety

Top AI firm alleges Chinese labs used 24K fake accounts to siphon US tech

Anthropic alleges that Chinese labs DeepSeek, Moonshot AI, and MiniMax used 24,000 fake accounts to extract capabilities from its Claude ...

AI Tools & Products · 8 min ·
Prior Authorization Is Broken. CMS’s New Rule Shows Why Regulated AI Is the Way Out
Ai Safety

Prior Authorization Is Broken. CMS’s New Rule Shows Why Regulated AI Is the Way Out

The article critiques the prior authorization process in healthcare, highlighting its inefficiencies and the imbalance in automation betw...

AI News - General · 7 min ·
Machine Learning

I experimented with giving an AI agent a symbolic anatomy — soul, heart, brain, and shadow

The article explores an experiment where the author assigns symbolic anatomy—soul, heart, brain, and shadow—to an AI agent, reflecting on...

Reddit - Artificial Intelligence · 1 min ·
A Meta AI security researcher said an OpenClaw agent ran amok on her inbox  | TechCrunch
Ai Agents

A Meta AI security researcher said an OpenClaw agent ran amok on her inbox  | TechCrunch

Meta AI researcher Summer Yue shares a cautionary tale about her OpenClaw AI agent, which mistakenly deleted her emails despite her comma...

TechCrunch - AI · 6 min ·
Machine Learning

[P] I built an AI alignment engine based on Thermodynamics instead of RLHF. It doesn’t just "refuse" unsafe inputs—it physically decouples from them.

The article discusses a novel AI alignment engine based on thermodynamics, proposing a framework that decouples unsafe inputs rather than...

Reddit - Machine Learning · 1 min ·
Previous Page 68 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime