NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
The eBook discusses the 2025 AI hype correction, highlighting unmet promises by AI leaders and the need for realistic expectations in AI ...
Anthropic faces scrutiny from the Pentagon over its refusal to allow its AI in military operations, risking a $200 million contract due t...
Amazon attributes recent AWS outages to human errors involving its AI coding assistant, Kiro, highlighting the challenges of AI integrati...
Toy Story 5 introduces a new villain, an AI tablet named Lilypad, highlighting the tension between traditional toys and modern technology...
DeepSeek-V3's analysis reveals a troubling view on public truth-telling in China, suggesting that those unable to remain silent may need ...
The article explores the potential of relocating AI data centers to outer space to mitigate environmental impacts on Earth, highlighting ...
Discussion thread for the upcoming release of FAccT 2026 paper reviews, encouraging community engagement and insights on fairness, accoun...
A study reveals that Chinese AI chatbots are more likely to censor politically sensitive questions compared to their non-Chinese counterp...
The Pentagon's CTO, Emil Michael, emphasizes the need for tailored AI regulations in military applications amid a dispute with Anthropic ...
This paper introduces a framework for reliable representation learning in machine learning, emphasizing the importance of representation-...
The paper presents a novel method for LLM fingerprinting using semantically conditioned watermarks, enhancing robustness against common d...
This paper presents a new framework, MAR-S, for robust and efficient inference with unstructured data, addressing biases in neural networ...
The paper presents Contrastive Diffusion Alignment (ConDA), a method that enhances the interpretability and control of diffusion models b...
This paper presents the Fair-SMW algorithm, an innovative approach to spectral clustering that enhances computational efficiency while en...
The paper introduces GGBall, a novel graph generative model utilizing hyperbolic geometry to enhance the generation of hierarchical struc...
This paper explores risk-aware decision-making in restless bandits, proposing new algorithms for planning and learning that incorporate r...
This article presents a novel training framework for instruction-following language models that maintains safety during fine-tuning by ad...
The article presents 'genriesz', an open-source Python package designed for automatic debiased machine learning using generalized Riesz r...
The paper 'ABCD: All Biases Come Disguised' explores biases in LLMs during multiple-choice question evaluations, proposing a new protocol...
This paper explores representation collapse in neural machine translation models, particularly focusing on the Transformer architecture a...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime