AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · 2 days ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · 2 days ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · 2 days ago

All Content

Llms

[2411.02317] Defining and Evaluating Physical Safety for Large Language Models

This paper explores the physical safety of Large Language Models (LLMs) in controlling robotic systems, identifying risks and proposing a...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.06838] An Adaptive Differentially Private Federated Learning Framework with Bi-level Optimization

This paper presents an adaptive differentially private federated learning framework that addresses challenges in model efficiency and sta...

arXiv - AI · 4 min · about 1 month ago

Llms

[2511.17673] Bridging Symbolic Control and Neural Reasoning in LLM Agents: Structured Cognitive Loop with a Governance Layer

This article introduces the Structured Cognitive Loop (SCL) architecture for large language model (LLM) agents, addressing key architectu...

arXiv - AI · 4 min · about 1 month ago

Robotics

[2510.00167] Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI

The paper discusses how embodied AI enables drones to make adaptive landing decisions in real-time, enhancing their resilience and safety...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2507.23497] Sufficient, Necessary and Complete Causal Explanations in Image Classification

This paper explores causal explanations in image classification, demonstrating their formal properties and computability, while introduci...

arXiv - AI · 4 min · about 1 month ago

Llms

[2503.23339] A Scalable Framework for Evaluating Health Language Models

This paper presents a scalable framework for evaluating health language models, introducing Adaptive Precise Boolean rubrics to enhance e...

arXiv - AI · 4 min · about 1 month ago

Ai Agents

[2502.13062] AI-Assisted Decision Making with Human Learning

This paper explores AI-assisted decision-making, focusing on how algorithms can enhance human learning through feature selection, balanci...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.17645] Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

This paper presents M-Attack-V2, an advanced method for executing black-box attacks on Large Vision-Language Models (LVLMs) by improving ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.17658] MARS: Margin-Aware Reward-Modeling with Self-Refinement

The paper presents MARS, a novel margin-aware reward modeling framework that enhances training efficiency by focusing on ambiguous prefer...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.17633] When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

The paper discusses the balance between weak and strong verification methods in reasoning with large language models (LLMs), emphasizing ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.17608] Towards Anytime-Valid Statistical Watermarking

The paper presents a novel framework for statistical watermarking in machine-generated content, addressing limitations of existing method...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.17605] Adapting Actively on the Fly: Relevance-Guided Online Meta-Learning with Latent Concepts for Geospatial Discovery

This paper presents a novel framework for geospatial discovery that integrates active learning and online meta-learning, focusing on rele...

arXiv - Machine Learning · 4 min · about 1 month ago

Robotics

[2602.17586] Conditional Flow Matching for Continuous Anomaly Detection in Autonomous Driving on a Manifold-Aware Spectral Space

This paper presents Deep-Flow, an innovative framework for anomaly detection in autonomous driving, utilizing Optimal Transport Condition...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.17532] Systematic Evaluation of Single-Cell Foundation Model Interpretability Reveals Attention Captures Co-Expression Rather Than Unique Regulatory Signal

This article evaluates the interpretability of single-cell foundation models, revealing that attention mechanisms capture co-expression r...

arXiv - AI · 3 min · about 1 month ago

Ai Startups

[2602.17531] Position: Evaluation of ECG Representations Must Be Fixed

This paper critiques current benchmarking practices in 12-lead ECG representation learning, advocating for broader evaluation criteria to...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.17493] Learning with Boolean threshold functions

This article presents a novel method for training neural networks on Boolean data using Boolean threshold functions (BTF), demonstrating ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.17452] Jolt Atlas: Verifiable Inference via Lookup Arguments in Zero Knowledge

Jolt Atlas introduces a zero-knowledge machine learning framework that enhances inference verification through lookup arguments, optimizi...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.17483] What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data

This article presents a human-centered audit of how large language models (LLMs) associate personal data with individual names, highlight...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.17431] Fine-Grained Uncertainty Quantification for Long-Form Language Model Outputs: A Comparative Study

This study presents a taxonomy for fine-grained uncertainty quantification in long-form language model outputs, highlighting effective me...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.17423] Convergence Analysis of Two-Layer Neural Networks under Gaussian Input Masking

This paper explores the convergence of two-layer neural networks trained with Gaussian masked inputs, demonstrating linear convergence th...

arXiv - AI · 3 min · about 1 month ago

Previous Page 79 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

All Content

[2411.02317] Defining and Evaluating Physical Safety for Large Language Models

[2602.06838] An Adaptive Differentially Private Federated Learning Framework with Bi-level Optimization

[2511.17673] Bridging Symbolic Control and Neural Reasoning in LLM Agents: Structured Cognitive Loop with a Governance Layer

[2510.00167] Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI

[2507.23497] Sufficient, Necessary and Complete Causal Explanations in Image Classification

[2503.23339] A Scalable Framework for Evaluating Health Language Models

[2502.13062] AI-Assisted Decision Making with Human Learning

[2602.17645] Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

[2602.17658] MARS: Margin-Aware Reward-Modeling with Self-Refinement

[2602.17633] When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

[2602.17608] Towards Anytime-Valid Statistical Watermarking

[2602.17605] Adapting Actively on the Fly: Relevance-Guided Online Meta-Learning with Latent Concepts for Geospatial Discovery

[2602.17586] Conditional Flow Matching for Continuous Anomaly Detection in Autonomous Driving on a Manifold-Aware Spectral Space

[2602.17532] Systematic Evaluation of Single-Cell Foundation Model Interpretability Reveals Attention Captures Co-Expression Rather Than Unique Regulatory Signal

[2602.17531] Position: Evaluation of ECG Representations Must Be Fixed

[2602.17493] Learning with Boolean threshold functions

[2602.17452] Jolt Atlas: Verifiable Inference via Lookup Arguments in Zero Knowledge

[2602.17483] What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data

[2602.17431] Fine-Grained Uncertainty Quantification for Long-Form Language Model Outputs: A Comparative Study

[2602.17423] Convergence Analysis of Two-Layer Neural Networks under Gaussian Input Masking

Related Topics

Stay updated with AI News