AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.17345] What Breaks Embodied AI Security:LLM Vulnerabilities, CPS Flaws,or Something Else?
Llms

[2602.17345] What Breaks Embodied AI Security:LLM Vulnerabilities, CPS Flaws,or Something Else?

This paper explores vulnerabilities in embodied AI systems, highlighting the inadequacy of existing analyses focused solely on LLMs or CP...

arXiv - AI · 4 min ·
[2602.17364] A feature-stable and explainable machine learning framework for trustworthy decision-making under incomplete clinical data
Machine Learning

[2602.17364] A feature-stable and explainable machine learning framework for trustworthy decision-making under incomplete clinical data

This article presents CACTUS, a machine learning framework designed to enhance decision-making in clinical settings by ensuring feature s...

arXiv - AI · 4 min ·
[2602.17342] From Subtle to Significant: Prompt-Driven Self-Improving Optimization in Test-Time Graph OOD Detection
Machine Learning

[2602.17342] From Subtle to Significant: Prompt-Driven Self-Improving Optimization in Test-Time Graph OOD Detection

The paper presents SIGOOD, a novel framework for improving graph out-of-distribution detection through prompt-driven self-improvement, en...

arXiv - AI · 4 min ·
[2602.17330] SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework
Machine Learning

[2602.17330] SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework

The paper presents SubQuad, an innovative pipeline for analyzing adaptive immune repertoires, addressing challenges of high computational...

arXiv - Machine Learning · 3 min ·
[2602.17283] Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective
Llms

[2602.17283] Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

This article presents X-Value, a new benchmark for assessing cross-lingual values in large language models (LLMs), highlighting their lim...

arXiv - AI · 4 min ·
[2602.17271] Federated Latent Space Alignment for Multi-user Semantic Communications
Ai Safety

[2602.17271] Federated Latent Space Alignment for Multi-user Semantic Communications

This paper presents a novel approach to federated latent space alignment in multi-user semantic communications, addressing semantic misma...

arXiv - AI · 3 min ·
[2602.17183] Robustness and Reasoning Fidelity of Large Language Models in Long-Context Code Question Answering
Llms

[2602.17183] Robustness and Reasoning Fidelity of Large Language Models in Long-Context Code Question Answering

This article examines the robustness and reasoning fidelity of large language models (LLMs) in long-context code question answering, reve...

arXiv - AI · 3 min ·
[2602.17174] Continual uncertainty learning
Machine Learning

[2602.17174] Continual uncertainty learning

The paper presents a novel framework for continual uncertainty learning in robust control of nonlinear dynamical systems, addressing chal...

arXiv - AI · 4 min ·
[2602.17124] 3D Scene Rendering with Multimodal Gaussian Splatting
Computer Vision

[2602.17124] 3D Scene Rendering with Multimodal Gaussian Splatting

This paper presents a novel approach to 3D scene rendering using multimodal Gaussian splatting, integrating RF sensing for improved accur...

arXiv - AI · 4 min ·
[2602.17095] FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
Llms

[2602.17095] FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

The paper presents FLoRG, a federated fine-tuning framework that utilizes low-rank Gram matrices and Procrustes alignment to enhance the ...

arXiv - AI · 4 min ·
[2602.17070] General sample size analysis for probabilities of causation: a delta method approach
Data Science

[2602.17070] General sample size analysis for probabilities of causation: a delta method approach

This paper presents a delta method approach for sample size analysis in estimating probabilities of causation (PoCs), addressing the need...

arXiv - AI · 3 min ·
[2602.17037] Wink: Recovering from Misbehaviors in Coding Agents
Llms

[2602.17037] Wink: Recovering from Misbehaviors in Coding Agents

The paper presents 'Wink', a system designed to recover coding agents from misbehaviors, enhancing their reliability in software developm...

arXiv - AI · 4 min ·
[2602.16844] Overseeing Agents Without Constant Oversight: Challenges and Opportunities
Ai Agents

[2602.16844] Overseeing Agents Without Constant Oversight: Challenges and Opportunities

This article explores the challenges and opportunities in overseeing AI agents without constant human oversight, focusing on user studies...

arXiv - AI · 3 min ·
[2602.16826] HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind
Machine Learning

[2602.16826] HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind

The paper presents HiVAE, a hierarchical variational architecture designed to enhance AI's theory of mind capabilities, enabling better i...

arXiv - AI · 3 min ·
[2602.16829] Learning under noisy supervision is governed by a feedback-truth gap
Machine Learning

[2602.16829] Learning under noisy supervision is governed by a feedback-truth gap

This paper explores how learning under noisy supervision is influenced by a feedback-truth gap, demonstrating its effects across various ...

arXiv - AI · 3 min ·
[2602.16802] References Improve LLM Alignment in Non-Verifiable Domains
Llms

[2602.16802] References Improve LLM Alignment in Non-Verifiable Domains

This paper explores how reference-guided evaluators can enhance LLM alignment in non-verifiable domains, demonstrating significant improv...

arXiv - Machine Learning · 4 min ·
[2602.16800] Large-scale online deanonymization with LLMs
Llms

[2602.16800] Large-scale online deanonymization with LLMs

This article discusses the use of large language models (LLMs) for deanonymizing online users, demonstrating high precision in identifyin...

arXiv - Machine Learning · 4 min ·
[2602.16747] LiveClin: A Live Clinical Benchmark without Leakage
Llms

[2602.16747] LiveClin: A Live Clinical Benchmark without Leakage

LiveClin introduces a novel clinical benchmark for evaluating medical LLMs, addressing issues of data contamination and knowledge obsoles...

arXiv - AI · 4 min ·
[2602.16741] Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis
Llms

[2602.16741] Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

This study investigates whether adversarial code comments can mislead AI security reviewers during vulnerability detection in code, revea...

arXiv - Machine Learning · 4 min ·
[2602.16740] Quantifying LLM Attention-Head Stability: Implications for Circuit Universality
Llms

[2602.16740] Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

This article examines the stability of attention heads in transformer models, revealing insights into their representational robustness a...

arXiv - AI · 4 min ·
Previous Page 80 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime