NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
The paper introduces SocialHarmBench, a dataset designed to evaluate the vulnerabilities of large language models (LLMs) to socially harm...
This article explores the use of NLP and machine learning techniques for enhancing malware classification accuracy, achieving a notable 9...
The paper introduces Probability Bounding (PB), a novel post-hoc calibration method that uses Box-Constrained Softmax to improve the cali...
The paper presents SafeFlowMatcher, a new planning framework that integrates flow matching with control barrier functions to ensure safe ...
The paper presents AttestLLM, a novel framework for efficiently attesting billion-scale on-device LLMs, ensuring model legitimacy and pro...
The paper presents a novel watermarking scheme for black-box language models, enabling detection of model outputs without requiring white...
This article explores the development of role-aware language models designed to enhance access control in organizational settings, focusi...
Winsor-CAM introduces a novel method for visual explanations in deep networks, enhancing interpretability through human-tunable parameter...
The paper presents AbstRaL, a method to enhance large language models' reasoning capabilities by reinforcing abstract thinking, particula...
The paper presents SAGE-5GC, a set of security-aware guidelines for evaluating anomaly detection in the 5G Core Network, addressing chall...
The paper discusses the systemic risks posed by algorithmic collisions in interconnected AI systems, highlighting the need for improved g...
The paper explores how fine-tuning large language models can unintentionally create vulnerabilities, analyzing factors like dataset chara...
The paper presents BitHydra, a framework for executing bit-flip inference cost attacks on large language models (LLMs), demonstrating how...
This paper explores the balance between watermark strength and speculative sampling efficiency in language models, proposing a new approa...
The paper introduces FaLW, a novel method for machine unlearning that addresses challenges in long-tailed data scenarios, enhancing data ...
This article systematically compares various explainability methods for detecting hardware trojans, focusing on their effectiveness in pr...
This article surveys advancements in multi-turn interactions with large language models (LLMs), focusing on evaluation methods, challenge...
The paper presents JavisDiT, a novel Joint Audio-Video Diffusion Transformer that enhances synchronized audio-video generation through a ...
This article presents a novel approach to polyphonic music generation using structural inductive bias, focusing on Beethoven's piano sona...
The paper presents a novel defense mechanism against adversarial attacks in machine learning using a soft-gated fractional mixture-of-exp...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime