AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · 2 days ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · 2 days ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · 2 days ago

All Content

Llms

[2602.17345] What Breaks Embodied AI Security:LLM Vulnerabilities, CPS Flaws,or Something Else?

This paper explores vulnerabilities in embodied AI systems, highlighting the inadequacy of existing analyses focused solely on LLMs or CP...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.17364] A feature-stable and explainable machine learning framework for trustworthy decision-making under incomplete clinical data

This article presents CACTUS, a machine learning framework designed to enhance decision-making in clinical settings by ensuring feature s...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.17342] From Subtle to Significant: Prompt-Driven Self-Improving Optimization in Test-Time Graph OOD Detection

The paper presents SIGOOD, a novel framework for improving graph out-of-distribution detection through prompt-driven self-improvement, en...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.17330] SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework

The paper presents SubQuad, an innovative pipeline for analyzing adaptive immune repertoires, addressing challenges of high computational...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.17283] Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

This article presents X-Value, a new benchmark for assessing cross-lingual values in large language models (LLMs), highlighting their lim...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.17271] Federated Latent Space Alignment for Multi-user Semantic Communications

This paper presents a novel approach to federated latent space alignment in multi-user semantic communications, addressing semantic misma...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.17183] Robustness and Reasoning Fidelity of Large Language Models in Long-Context Code Question Answering

This article examines the robustness and reasoning fidelity of large language models (LLMs) in long-context code question answering, reve...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.17174] Continual uncertainty learning

The paper presents a novel framework for continual uncertainty learning in robust control of nonlinear dynamical systems, addressing chal...

arXiv - AI · 4 min · about 1 month ago

Computer Vision

[2602.17124] 3D Scene Rendering with Multimodal Gaussian Splatting

This paper presents a novel approach to 3D scene rendering using multimodal Gaussian splatting, integrating RF sensing for improved accur...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.17095] FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

The paper presents FLoRG, a federated fine-tuning framework that utilizes low-rank Gram matrices and Procrustes alignment to enhance the ...

arXiv - AI · 4 min · about 1 month ago

Data Science

[2602.17070] General sample size analysis for probabilities of causation: a delta method approach

This paper presents a delta method approach for sample size analysis in estimating probabilities of causation (PoCs), addressing the need...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.17037] Wink: Recovering from Misbehaviors in Coding Agents

The paper presents 'Wink', a system designed to recover coding agents from misbehaviors, enhancing their reliability in software developm...

arXiv - AI · 4 min · about 1 month ago

Ai Agents

[2602.16844] Overseeing Agents Without Constant Oversight: Challenges and Opportunities

This article explores the challenges and opportunities in overseeing AI agents without constant human oversight, focusing on user studies...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.16826] HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind

The paper presents HiVAE, a hierarchical variational architecture designed to enhance AI's theory of mind capabilities, enabling better i...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.16829] Learning under noisy supervision is governed by a feedback-truth gap

This paper explores how learning under noisy supervision is influenced by a feedback-truth gap, demonstrating its effects across various ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.16802] References Improve LLM Alignment in Non-Verifiable Domains

This paper explores how reference-guided evaluators can enhance LLM alignment in non-verifiable domains, demonstrating significant improv...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.16800] Large-scale online deanonymization with LLMs

This article discusses the use of large language models (LLMs) for deanonymizing online users, demonstrating high precision in identifyin...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.16747] LiveClin: A Live Clinical Benchmark without Leakage

LiveClin introduces a novel clinical benchmark for evaluating medical LLMs, addressing issues of data contamination and knowledge obsoles...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.16741] Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

This study investigates whether adversarial code comments can mislead AI security reviewers during vulnerability detection in code, revea...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.16740] Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

This article examines the stability of attention heads in transformer models, revealing insights into their representational robustness a...

arXiv - AI · 4 min · about 1 month ago

Previous Page 80 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

All Content

[2602.17345] What Breaks Embodied AI Security:LLM Vulnerabilities, CPS Flaws,or Something Else?

[2602.17364] A feature-stable and explainable machine learning framework for trustworthy decision-making under incomplete clinical data

[2602.17342] From Subtle to Significant: Prompt-Driven Self-Improving Optimization in Test-Time Graph OOD Detection

[2602.17330] SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework

[2602.17283] Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

[2602.17271] Federated Latent Space Alignment for Multi-user Semantic Communications

[2602.17183] Robustness and Reasoning Fidelity of Large Language Models in Long-Context Code Question Answering

[2602.17174] Continual uncertainty learning

[2602.17124] 3D Scene Rendering with Multimodal Gaussian Splatting

[2602.17095] FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

[2602.17070] General sample size analysis for probabilities of causation: a delta method approach

[2602.17037] Wink: Recovering from Misbehaviors in Coding Agents

[2602.16844] Overseeing Agents Without Constant Oversight: Challenges and Opportunities

[2602.16826] HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind

[2602.16829] Learning under noisy supervision is governed by a feedback-truth gap

[2602.16802] References Improve LLM Alignment in Non-Verifiable Domains

[2602.16800] Large-scale online deanonymization with LLMs

[2602.16747] LiveClin: A Live Clinical Benchmark without Leakage

[2602.16741] Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

[2602.16740] Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

Related Topics

Stay updated with AI News