NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
The paper presents Clust-PSI-PFL, a novel framework for personalized federated learning that addresses challenges posed by non-IID data t...
This article presents a novel approach to training medical large language models (LLMs) through dialogue-based fine-tuning, improving the...
The paper discusses the limitations of current unlearning methods in large language models (LLMs), revealing that they fail to effectivel...
This paper explores the concept of causal effect identifiability, focusing on state-based effects and how they can be identifiable even w...
CleverCatch introduces a knowledge-guided weak supervision model for detecting healthcare fraud, enhancing accuracy and interpretability ...
This article presents a systematic study of medical interpretability in Large Language Models (LLMs), exploring how these models process ...
This paper presents a cost-efficient approach to auditing fairness in classifiers under partial feedback, proposing algorithms that outpe...
The paper introduces KVComm, a novel framework for efficient communication between Large Language Models (LLMs) using selective KV pair s...
Voxtral Realtime presents a novel streaming automatic speech recognition model achieving offline transcription quality with sub-second la...
This article explores how large reasoning models (LRMs) can implicitly determine when to stop processing information, introducing a new s...
This paper presents a novel framework for interpretable failure analysis in Multi-Agent Reinforcement Learning (MARL) systems, focusing o...
The paper introduces Diffusion Alignment as Variational Expectation-Maximization (DAV), a novel framework that optimizes diffusion models...
The paper introduces STCast, an AI-driven framework for adaptive boundary alignment in weather forecasting, enhancing regional forecasts ...
This study explores student perceptions of AI grading systems, focusing on fairness, trust, consistency, and transparency in an undergrad...
The paper discusses differentiable social choice, a framework integrating machine learning with social choice theory, identifying 18 open...
The paper introduces OffSeeker, a model demonstrating that offline training can effectively replace costly online reinforcement learning ...
This paper discusses the impact of uncertainty in ground truth evaluations on AI performance assessments, proposing a probabilistic frame...
The paper presents DriftLite, a lightweight approach for inference-time scaling of diffusion models, enhancing adaptation to new distribu...
This paper presents CERMIC, a novel framework for enhancing multi-agent exploration in reinforcement learning by calibrating intrinsic cu...
This paper introduces a benchmark for evaluating outcome-driven constraint violations in autonomous AI agents, highlighting safety concer...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime