NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
This paper presents a hybrid approach to detecting clickbait using large language models and informativeness measures, achieving a high F...
The paper presents FENCE, a bilingual multimodal dataset designed for detecting jailbreaks in financial applications, highlighting vulner...
This paper explores optimal data collection strategies from biased and costly sources, focusing on maximizing effective sample size under...
The paper presents TFL, a targeted bit-flip attack framework for large language models (LLMs) that allows precise manipulation of outputs...
The paper introduces OODBench, a benchmark for evaluating large vision-language models' performance on out-of-distribution (OOD) data, hi...
This article explores how perceived political bias in large language models (LLMs) can diminish their effectiveness in persuasion, reveal...
This article presents a framework for operational certification in conformal predictors, focusing on trade-offs beyond mere coverage, and...
This paper discusses the evolution of AI evaluation from static models to dynamic agents, emphasizing the need for standardized evaluatio...
The paper introduces DeepSVU, a novel approach for Security-oriented Video Understanding that identifies threats and evaluates their caus...
The paper introduces CLUTCH, a novel model for generating hand motions from text, leveraging a new dataset and advanced techniques to imp...
This article examines the limitations of machine learning in materials discovery, highlighting that high performance on benchmarks may st...
The paper presents PenTiDef, a novel framework designed to enhance privacy and robustness in decentralized federated intrusion detection ...
The paper presents ROCKET, a novel framework for enhancing Vision-Language-Action models by employing residual-oriented multi-layer align...
This study presents a hybrid modeling framework that combines scientific knowledge with machine learning to improve vessel power predicti...
The paper presents PRISM-FCP, a Byzantine-resilient framework for federated conformal prediction that enhances robustness against attacks...
This paper explores the unreliability of steering vectors in language models, examining how geometric predictors and linear approximation...
The paper presents MultiVer, a zero-shot multi-agent system for vulnerability detection that outperforms fine-tuned models in recall, ach...
This paper examines the 'induction bias' in sequence models, focusing on the limitations of transformer-based models in state tracking co...
This paper explores the fine-grained knowledge capabilities of vision-language models (VLMs), highlighting their performance on visual qu...
This paper explores the monitorability of chain-of-thought (CoT) systems in LLMs using information theory, identifying errors that affect...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime