NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
The paper presents LA-LoRA, a novel approach for fine-tuning large models in privacy-preserving federated learning, addressing key challe...
This article evaluates the operational robustness of large language models (LLMs) in code generation, proposing a new method to assess th...
The paper presents MANATEE, a novel defense mechanism for large language models (LLMs) against adversarial attacks, utilizing a lightweig...
This paper introduces generalized Hessian estimators for stochastic optimization using random direction stochastic approximation, demonst...
The paper introduces TAG, a vision-language framework for Facial Expression Recognition (FER) that enhances reasoning by grounding predic...
This paper advocates for the machine learning community to adopt data frugality in AI development, emphasizing its environmental benefits...
The paper presents UFO, a quantized two-party computation framework that optimizes private CNN inference by combining efficient protocols...
The paper presents a novel pipeline for synthesizing multimodal geometry datasets, introducing the GeoCode dataset which enhances visual-...
The paper presents GRAPHIC, a novel approach using network science to analyze confusion matrices in deep learning, enhancing understandin...
The paper presents MiSCHiEF, a benchmark for evaluating fine-grained image-caption alignment, focusing on safety and cultural contexts, h...
This article evaluates how data anonymization affects the performance of Content-Based Image Retrieval (CBIR) systems, highlighting the b...
This article presents the Advantage-based Adversarial Transformer (AAT), a novel method for generating time-correlated adversarial exampl...
Luna-2 introduces a scalable architecture for single-token evaluation using small language models, enhancing accuracy and reducing costs ...
This paper presents a novel framework for voice classification of Parkinson's and ALS using fairness-aware partial-label domain adaptatio...
The paper critiques the reliance on softmax outputs in adaptive conformal classification, proposing a new method that utilizes pre-softma...
This article presents a case study on the security implications of Indirect Prompt Injection (IPI) in Large Language Models (LLMs) used i...
The paper introduces Spiking Graph Predictive Coding (SIGHT), a novel approach to enhance out-of-distribution (OOD) generalization in gra...
This paper presents a method for enhancing stability in deep reinforcement learning by utilizing isotropic Gaussian representations, addr...
The paper explores the effectiveness of unanimous committees of Large Language Models (LLMs) in evaluating SQL queries, revealing insight...
The article examines red teaming as a socio-technical practice in evaluating large language models (LLMs), highlighting the importance of...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime