NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much
submitted by /u/esporx [link] [comments]
Alignment, bias, regulation, and responsible AI
submitted by /u/esporx [link] [comments]
RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...
Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...
The paper presents new privacy-preserving protocols for verifiable inference of large language models (LLMs), addressing the challenges o...
The paper introduces NeST, a novel framework for enhancing safety in large language models (LLMs) by selectively tuning a small subset of...
This paper explores substantive fairness in conformal prediction, analyzing its impact on downstream decision-making and proposing method...
The A.R.I.S. system utilizes deep learning to enhance e-waste recycling by accurately classifying materials in real-time, improving recov...
This paper introduces One-Shot Incremental Federated Learning (OSI-FL), a novel framework that mitigates catastrophic forgetting and comm...
This paper presents KD-UFSL, a method to enhance privacy in federated split learning by minimizing data leakage through intermediate repr...
This paper explores weight regularization techniques in low-rank continual learning, proposing EWC-LoRA to mitigate task interference whi...
This article presents a theoretical framework for modular learning in robust generative models, exploring the combination of domain-speci...
The paper presents LexiSafe, a novel offline safe reinforcement learning framework that employs a lexicographic safety-reward hierarchy t...
This paper presents an efficient method for privacy loss accounting in subsampling and random allocation, demonstrating advantages over t...
CounterFlowNet introduces a novel generative approach for creating counterfactual explanations in machine learning, enhancing interpretab...
This paper introduces a locality radius framework to understand relational inductive bias in database learning, focusing on the necessary...
The paper presents MeGU, a novel framework for machine unlearning that addresses the challenge of effectively erasing target data while p...
The paper introduces UniLeak, a framework that identifies universal activation directions in language models, enhancing the understanding...
This paper proposes a fail-closed alignment mechanism for large language models (LLMs) to enhance their safety and robustness against pro...
This paper presents a framework for certifying data-poisoning attacks in neural networks using mixed-integer programming, ensuring robust...
This paper analyzes how two-layer neural networks learn to solve the modular addition task, providing insights into feature learning, tra...
This paper presents a residual-aware theory explaining the position bias in Transformers, revealing how residual connections prevent atte...
This article presents a novel approach to automated circuit discovery in neural networks, emphasizing provable guarantees for robustness ...
This paper explores omitted variable bias in language models under distribution shifts, proposing a framework to evaluate and optimize pe...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime