NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
The paper discusses a novel automated pipeline for detecting unverbalized biases in Large Language Models (LLMs), highlighting its effect...
This paper analyzes DARPA's AI Cyber Challenge (AIxCC), focusing on competition design, architectural approaches of finalists, and key le...
This study audits the collaboration between online graduate CS students and AI, exploring preferences for automation in academic tasks an...
This paper presents Contrastive Object-centric Diffusion Alignment (CODA), an enhancement to object-centric learning that reduces slot en...
This article explores the integration of Theory of Mind (ToM) in human-robot interaction (HRI) to enhance robot interpretability and user...
This article introduces the Block-Recurrent Hypothesis (BRH) for Vision Transformers, proposing a new framework for understanding their c...
This article explores the biases inherent in post-hoc feature attribution methods used in language models, revealing how lexical and posi...
The paper presents a novel method for generating high-fidelity local explanations for black-box machine learning models using multivariat...
The paper presents Empathetic Cascading Networks (ECN), a multi-stage prompting technique aimed at enhancing the empathetic responses of ...
This paper discusses Semi-Supervised Preference Optimization (SSPO), which reduces the need for extensive labeled feedback in preference ...
VeriStruct is a novel framework for AI-assisted automated verification of complex data structure modules in Verus, achieving a high succe...
LRT-Diffusion introduces a risk-aware sampling method for diffusion policies in offline reinforcement learning, enhancing decision-making...
The VERA-MH Concept Paper outlines an innovative framework for evaluating AI chatbots in mental health contexts, focusing on suicide risk...
This article presents a novel watermarking technique specifically designed for diffusion language models (DLMs), addressing challenges in...
The paper introduces a novel method called discrete optimal transport voice conversion (kDOT-VC), demonstrating its effectiveness as an a...
This paper explores the use of Large Language Models (LLMs) to simulate voting behavior in the European Parliament through persona-driven...
This systematic literature review explores Explanation User Interfaces (XUIs) in AI, emphasizing the importance of effective user explana...
The paper presents Cert-SSBD, a novel method for defending against backdoor attacks in deep neural networks by optimizing noise levels sp...
This paper explores the integration of Self-Organizing Maps (SOMs) with Vision Transformers (ViTs) to enhance performance on small datase...
The paper introduces Rex, a family of reversible exponential (stochastic) Runge-Kutta solvers designed to enhance the inversion accuracy ...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime