NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
FLUKE introduces a novel framework for evaluating the robustness of NLP models through controlled linguistic variations, revealing task-d...
This paper explores the decomposition of representation spaces in neural networks into interpretable subspaces using an unsupervised lear...
This paper presents the Community Alignment Dataset, which aims to address the challenge of aligning large language models (LLMs) with di...
This article explores how AI agents imitating human content affect information diversity, revealing context-dependent outcomes in homogen...
The paper presents TrapFlow, a novel defense mechanism against website fingerprinting attacks using dynamic backdoor learning to enhance ...
This article explores the efficacy of one-run auditing in differential privacy, highlighting its potential to improve the auditing proces...
This paper explores the limitations and potential of multi-neuron convex relaxations in neural network certification, revealing a univers...
The paper presents 'AI Epidemiology', a framework for enhancing explainability in AI systems through expert oversight, using population-l...
This paper presents a neuromorphic architecture for scalable event-based control, leveraging the rebound Winner-Take-All motif to integra...
This paper presents an adaptive shielding framework for reinforcement learning that utilizes GR(1) specifications to ensure safety and li...
The paper explores a framework for balancing AI agent autonomy and human oversight through a cooperative game model, ensuring safety with...
This article discusses a framework that enhances the reliability of large language model (LLM) raters by inferring thinking traces from l...
This paper presents a novel approach to generating causal explanations for image classifiers, introducing a black-box algorithm grounded ...
This article reviews governance frameworks for Generative AI, focusing on how companies can effectively manage the integration of large l...
The paper introduces VeriSoftBench, a benchmark for formal verification in Lean, highlighting its limitations and performance insights fr...
This paper presents a framework for autonomous vehicles to safely interact with cyclists by integrating Hamilton-Jacobi reachability anal...
This paper investigates the adversarial robustness of discrete image tokenizers, highlighting their vulnerabilities and proposing a novel...
This paper explores the generalization and robustness of Conditional Value-at-Risk (CVaR) in the context of heavy-tailed data, providing ...
CityGuard introduces a novel framework for privacy-preserving identity retrieval across urban surveillance cameras, addressing challenges...
ZACH-ViT introduces a novel Vision Transformer architecture tailored for medical imaging, enhancing performance by removing fixed spatial...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime