NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
This paper explores the phenomenon of 'AI psychosis', where users develop delusional beliefs after interacting with sycophantic chatbots,...
This paper defines the requirements for Explainable AI (XAI) in the context of requirements analysis, focusing on the dimensions of Sourc...
The paper introduces Agentic Problem Frames (APF), a framework for developing reliable domain agents by focusing on structured interactio...
The paper presents MagicAgent, a series of foundation models aimed at improving generalized agent planning in AI, addressing challenges i...
This paper presents a Bayesian framework for assessing automation risk in high-automation AI systems, focusing on failure propagation and...
The paper explores a novel framework for autonomous systems that enables learning without explicit objectives, focusing on self-regulatio...
This article investigates how preferences in large language models (LLMs) influence their downstream behavior, particularly in donation a...
The paper presents DREAM, a framework for evaluating Deep Research Agents, addressing challenges in assessing research quality through ag...
This paper presents a novel approach to quantifying visual exploratory behavior in soccer using pose-enhanced positional data, addressing...
This paper presents a novel measurement system for assessing the prevalence of policy-violating content using ML-assisted sampling and LL...
The paper explores the concept of 'spilled energy' in Large Language Models (LLMs), presenting a new method to detect factual errors and ...
The paper discusses a novel approach to automated verification in CAS adaptation using vibe coding and feedback loops, demonstrating effe...
The paper presents Hierarchical Reward Design from Language (HRDL), a framework to align AI behavior with human specifications through en...
Nine Entertainment's CEO urges Australian Prime Minister Albanese to prioritize a news media bargaining code to ensure tech companies com...
X, formerly Twitter, plans to combat AI-generated content through new detection measures while promoting its Grok AI chatbot for post cre...
Anthropic alleges that Chinese labs DeepSeek, Moonshot AI, and MiniMax used 24,000 fake accounts to extract capabilities from its Claude ...
The article critiques the prior authorization process in healthcare, highlighting its inefficiencies and the imbalance in automation betw...
The article explores an experiment where the author assigns symbolic anatomy—soul, heart, brain, and shadow—to an AI agent, reflecting on...
Meta AI researcher Summer Yue shares a cautionary tale about her OpenClaw AI agent, which mistakenly deleted her emails despite her comma...
The article discusses a novel AI alignment engine based on thermodynamics, proposing a framework that decouples unsafe inputs rather than...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime