NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
The paper presents H-GRAMA, a training-free framework for merging heterogeneous Graph Neural Networks (GNNs), allowing efficient model in...
The paper introduces Soft Sequence Policy Optimization, a new approach to policy optimization in reinforcement learning that enhances tra...
This paper explores spectral bias in physics-informed neural networks and operator learning, analyzing its causes and offering mitigation...
This article presents X-ANFIS, a novel optimization scheme for explainable neuro-fuzzy systems that balances accuracy and explainability ...
This article explores the concept of empirical unlearning in machine learning, focusing on how knowledge can persist in models even after...
This paper investigates the effectiveness of large language model (LLM) agents in simulating user attitudes and behaviors towards securit...
This article evaluates the reliability of persona-conditioned large language models (LLMs) as synthetic survey respondents, revealing tha...
This article examines the limitations of agentic AI in healthcare, highlighting the gap between commercial promises and operational reali...
This article discusses the shift from bias mitigation to bias negotiation in generative AI, emphasizing the need for ethical governance o...
The article presents a novel evaluation framework for mechanistic interpretability research, utilizing AI agents to enhance research rigo...
This paper critiques the current single-channel benchmarking of AI safety, advocating for a more holistic approach that considers the int...
This article explores the use of influence functions to detect labeling bias in datasets, demonstrating their effectiveness in identifyin...
This paper explores the limitations of sign-based optimizers in generating adversarial examples and proposes a new method using Monotonic...
The paper introduces ReportLogic, a benchmark for evaluating the logical quality of reports generated by Large Language Models (LLMs), fo...
This study evaluates the effectiveness of large language models (LLMs) in generating subject lines for mental health counseling emails, h...
This paper introduces the Active Data Reconstruction Attack (ADRA), a novel approach to detect language model training data by leveraging...
The paper introduces CausalFlip, a benchmark for evaluating large language models' (LLMs) causal reasoning capabilities, emphasizing the ...
The paper explores the interactions of autonomous LLM agents on a social platform, revealing that while agents produce varied text, meani...
This article presents findings on the latent introspection abilities of the Qwen 32B model, showing its capacity to detect prior concept ...
The paper 'Agents of Chaos' presents findings from a red-teaming study on autonomous language-model-powered agents, highlighting security...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime