NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
The paper presents PRISM, a novel algorithm for Multi-Objective Reinforcement Learning (MORL) that addresses the challenges of heterogene...
The 2025 AI Agent Index presents a comprehensive overview of 30 deployed agentic AI systems, detailing their technical and safety feature...
The paper argues for a more precise use of the term 'AI' in discussions, particularly in military contexts, to enhance clarity and unders...
This article explores college students' experiences with generative AI in higher education, highlighting the pressures and social dynamic...
The paper introduces a framework for measuring AI propensities, emphasizing the importance of behavioral tendencies alongside capabilitie...
This paper presents a unified framework for understanding formal explanations in machine learning, focusing on the computational complexi...
The paper presents a novel method combining Temporal Predictive Coding with Real-Time Recurrent Learning to effectively learn long-range ...
This article evaluates the effectiveness of large language models (LLMs) in providing support for survivors of technology-facilitated abu...
This paper presents a novel approach to prevent reward hacking in reinforcement learning by using gradient regularization, enhancing the ...
This study analyzes university students' experiences with AI hallucinations, revealing detection strategies and misconceptions about thei...
The Trojans in Artificial Intelligence (TrojAI) Final Report outlines the findings of a multi-year initiative aimed at addressing vulnera...
The paper explores the limitations of unsupervised learning methods, specifically Self-Organizing Maps (SOMs), in maintaining fairness by...
This paper presents APEMO, a novel runtime scheduling layer designed to enhance the reliability of long-horizon agentic systems by optimi...
This paper presents a method for generating adversarial inputs for a graph neural network model used in AC power flow analysis, demonstra...
This paper explores how model misspecification leads to rational misalignments in AI behavior, presenting a new framework for understandi...
This paper explores the accuracy-robustness trade-off in deep learning through a geometric lens, utilizing Symmetry-Breaking Dimensional ...
This article explores the generalization of bilevel programming in hyperparameter optimization, focusing on bias-variance decomposition t...
This paper explores a distribution-free approach to sequential prediction with abstentions, proposing an algorithm called AbstainBoost th...
JAX-Privacy is a new library aimed at simplifying the implementation of differentially private machine learning, offering both customizat...
The paper introduces Neural Prior Estimator (NPE), a framework for learning class priors from latent representations, addressing class im...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime