AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.19332] Training-Free Cross-Architecture Merging for Graph Neural Networks
Machine Learning

[2602.19332] Training-Free Cross-Architecture Merging for Graph Neural Networks

The paper presents H-GRAMA, a training-free framework for merging heterogeneous Graph Neural Networks (GNNs), allowing efficient model in...

arXiv - Machine Learning · 3 min ·
[2602.19327] Soft Sequence Policy Optimization: Bridging GMPO and SAPO
Llms

[2602.19327] Soft Sequence Policy Optimization: Bridging GMPO and SAPO

The paper introduces Soft Sequence Policy Optimization, a new approach to policy optimization in reinforcement learning that enhances tra...

arXiv - Machine Learning · 3 min ·
[2602.19265] Spectral bias in physics-informed and operator learning: Analysis and mitigation guidelines
Machine Learning

[2602.19265] Spectral bias in physics-informed and operator learning: Analysis and mitigation guidelines

This paper explores spectral bias in physics-informed neural networks and operator learning, analyzing its causes and offering mitigation...

arXiv - Machine Learning · 4 min ·
[2602.19253] Alternating Bi-Objective Optimization for Explainable Neuro-Fuzzy Systems
Machine Learning

[2602.19253] Alternating Bi-Objective Optimization for Explainable Neuro-Fuzzy Systems

This article presents X-ANFIS, a novel optimization scheme for explainable neuro-fuzzy systems that balances accuracy and explainability ...

arXiv - Machine Learning · 3 min ·
[2602.19215] Understanding Empirical Unlearning with Combinatorial Interpretability
Llms

[2602.19215] Understanding Empirical Unlearning with Combinatorial Interpretability

This article explores the concept of empirical unlearning in machine learning, focusing on how knowledge can persist in models even after...

arXiv - Machine Learning · 3 min ·
[2602.18464] How Well Can LLM Agents Simulate End-User Security and Privacy Attitudes and Behaviors?
Llms

[2602.18464] How Well Can LLM Agents Simulate End-User Security and Privacy Attitudes and Behaviors?

This paper investigates the effectiveness of large language model (LLM) agents in simulating user attitudes and behaviors towards securit...

arXiv - AI · 4 min ·
[2602.18462] Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents
Llms

[2602.18462] Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

This article evaluates the reliability of persona-conditioned large language models (LLMs) as synthetic survey respondents, revealing tha...

arXiv - AI · 3 min ·
[2602.18460] The Doctor Will (Still) See You Now: On the Structural Limits of Agentic AI in Healthcare
Robotics

[2602.18460] The Doctor Will (Still) See You Now: On the Structural Limits of Agentic AI in Healthcare

This article examines the limitations of agentic AI in healthcare, highlighting the gap between commercial promises and operational reali...

arXiv - AI · 4 min ·
[2602.18459] From Bias Mitigation to Bias Negotiation: Governing Identity and Sociocultural Reasoning in Generative AI
Llms

[2602.18459] From Bias Mitigation to Bias Negotiation: Governing Identity and Sociocultural Reasoning in Generative AI

This article discusses the shift from bias mitigation to bias negotiation in generative AI, emphasizing the need for ethical governance o...

arXiv - AI · 4 min ·
[2602.18458] The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research
Robotics

[2602.18458] The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research

The article presents a novel evaluation framework for mechanistic interpretability research, utilizing AI agents to enhance research rigo...

arXiv - Machine Learning · 3 min ·
[2602.18456] Beyond single-channel agentic benchmarking
Robotics

[2602.18456] Beyond single-channel agentic benchmarking

This paper critiques the current single-channel benchmarking of AI safety, advocating for a more holistic approach that considers the int...

arXiv - AI · 3 min ·
[2602.19130] Detecting labeling bias using influence functions
Machine Learning

[2602.19130] Detecting labeling bias using influence functions

This article explores the use of influence functions to detect labeling bias in datasets, demonstrating their effectiveness in identifyin...

arXiv - AI · 4 min ·
[2602.19096] The Power of Decaying Steps: Enhancing Attack Stability and Transferability for Sign-based Optimizers
Machine Learning

[2602.19096] The Power of Decaying Steps: Enhancing Attack Stability and Transferability for Sign-based Optimizers

This paper explores the limitations of sign-based optimizers in generating adversarial examples and proposes a new method using Monotonic...

arXiv - Machine Learning · 4 min ·
[2602.18446] ReportLogic: Evaluating Logical Quality in Deep Research Reports
Llms

[2602.18446] ReportLogic: Evaluating Logical Quality in Deep Research Reports

The paper introduces ReportLogic, a benchmark for evaluating the logical quality of reports generated by Large Language Models (LLMs), fo...

arXiv - AI · 4 min ·
[2602.18443] From "Help" to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications
Llms

[2602.18443] From "Help" to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications

This study evaluates the effectiveness of large language models (LLMs) in generating subject lines for mental health counseling emails, h...

arXiv - AI · 3 min ·
[2602.19020] Learning to Detect Language Model Training Data via Active Reconstruction
Llms

[2602.19020] Learning to Detect Language Model Training Data via Active Reconstruction

This paper introduces the Active Data Reconstruction Attack (ADRA), a novel approach to detect language model training data by leveraging...

arXiv - AI · 4 min ·
[2602.20094] CausalFlip: A Benchmark for LLM Causal Judgment Beyond Semantic Matching
Llms

[2602.20094] CausalFlip: A Benchmark for LLM Causal Judgment Beyond Semantic Matching

The paper introduces CausalFlip, a benchmark for evaluating large language models' (LLMs) causal reasoning capabilities, emphasizing the ...

arXiv - AI · 4 min ·
[2602.20059] Interaction Theater: A case of LLM Agents Interacting at Scale
Llms

[2602.20059] Interaction Theater: A case of LLM Agents Interacting at Scale

The paper explores the interactions of autonomous LLM agents on a social platform, revealing that while agents produce varied text, meani...

arXiv - AI · 4 min ·
[2602.20031] Latent Introspection: Models Can Detect Prior Concept Injections
Machine Learning

[2602.20031] Latent Introspection: Models Can Detect Prior Concept Injections

This article presents findings on the latent introspection abilities of the Qwen 32B model, showing its capacity to detect prior concept ...

arXiv - Machine Learning · 3 min ·
[2602.20021] Agents of Chaos
Llms

[2602.20021] Agents of Chaos

The paper 'Agents of Chaos' presents findings from a red-teaming study on autonomous language-model-powered agents, highlighting security...

arXiv - AI · 4 min ·
Previous Page 66 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime