NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
This paper explores methods to reduce biases in record matching through score calibration, proposing two model-agnostic post-processing t...
The paper presents a robust Taylor-Lagrange Control (rTLC) method for safety-critical systems, addressing the feasibility preservation pr...
The LLMbda Calculus introduces a formal framework for understanding AI agents' conversations, addressing vulnerabilities like prompt inje...
The paper introduces SkillInject, a benchmark for evaluating the vulnerability of LLM agents to skill file attacks, revealing high suscep...
This article presents a novel approach to conformal risk control for non-monotonic losses, extending traditional methods to multidimensio...
The paper presents CORE, a novel safety framework for open-world robots that enables contextual reasoning and enforcement of safety rules...
This article presents a framework for assessing the risks associated with using large language models (LLMs) in mental health support, hi...
The paper explores the 'Invisible Gorilla Effect' in out-of-distribution (OOD) detection, revealing that detection performance varies bas...
This paper addresses the modality gap in multimodal medical representation alignment, proposing a framework to enhance alignment between ...
The paper presents GOAL, a framework for Continual Generalized Category Discovery (C-GCD) that enhances class discovery while minimizing ...
The paper presents FairFS, a novel algorithm designed to address biases in feature selection for recommender systems, enhancing accuracy ...
The paper discusses the need for system-level threat monitoring in LLM-enabled applications, highlighting security challenges and advocat...
The paper presents MAS-FIRE, a framework for evaluating the reliability of LLM-based Multi-Agent Systems through fault injection, address...
The paper presents SafePickle, a machine-learning-based scanner designed to detect malicious Pickle-based ML models, achieving a high F1-...
The paper presents RobPI, a robust private inference protocol designed to counteract malicious client attacks, demonstrating significant ...
This article introduces Dirichlet Scale Mixture (DSM) priors for Bayesian Neural Networks, addressing limitations in interpretability and...
The paper proposes Carbon-Aware Governance Gates (CAGG) to integrate sustainability into Generative AI development, addressing the increa...
This article presents a novel framework, FedTAR, for generating personalized longitudinal medical reports using federated learning that a...
This article presents workflow-level design principles for integrating trustworthy Generative AI in automotive system engineering, addres...
The paper discusses the design of human-AI coexistence, emphasizing the need for governance frameworks to ensure responsible collaboratio...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime