AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2411.02317] Defining and Evaluating Physical Safety for Large Language Models
Llms

[2411.02317] Defining and Evaluating Physical Safety for Large Language Models

This paper explores the physical safety of Large Language Models (LLMs) in controlling robotic systems, identifying risks and proposing a...

arXiv - AI · 4 min ·
[2602.06838] An Adaptive Differentially Private Federated Learning Framework with Bi-level Optimization
Machine Learning

[2602.06838] An Adaptive Differentially Private Federated Learning Framework with Bi-level Optimization

This paper presents an adaptive differentially private federated learning framework that addresses challenges in model efficiency and sta...

arXiv - AI · 4 min ·
[2511.17673] Bridging Symbolic Control and Neural Reasoning in LLM Agents: Structured Cognitive Loop with a Governance Layer
Llms

[2511.17673] Bridging Symbolic Control and Neural Reasoning in LLM Agents: Structured Cognitive Loop with a Governance Layer

This article introduces the Structured Cognitive Loop (SCL) architecture for large language model (LLM) agents, addressing key architectu...

arXiv - AI · 4 min ·
[2510.00167] Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI
Robotics

[2510.00167] Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI

The paper discusses how embodied AI enables drones to make adaptive landing decisions in real-time, enhancing their resilience and safety...

arXiv - AI · 3 min ·
[2507.23497] Sufficient, Necessary and Complete Causal Explanations in Image Classification
Machine Learning

[2507.23497] Sufficient, Necessary and Complete Causal Explanations in Image Classification

This paper explores causal explanations in image classification, demonstrating their formal properties and computability, while introduci...

arXiv - AI · 4 min ·
[2503.23339] A Scalable Framework for Evaluating Health Language Models
Llms

[2503.23339] A Scalable Framework for Evaluating Health Language Models

This paper presents a scalable framework for evaluating health language models, introducing Adaptive Precise Boolean rubrics to enhance e...

arXiv - AI · 4 min ·
[2502.13062] AI-Assisted Decision Making with Human Learning
Ai Agents

[2502.13062] AI-Assisted Decision Making with Human Learning

This paper explores AI-assisted decision-making, focusing on how algorithms can enhance human learning through feature selection, balanci...

arXiv - AI · 4 min ·
[2602.17645] Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting
Llms

[2602.17645] Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

This paper presents M-Attack-V2, an advanced method for executing black-box attacks on Large Vision-Language Models (LVLMs) by improving ...

arXiv - AI · 4 min ·
[2602.17658] MARS: Margin-Aware Reward-Modeling with Self-Refinement
Machine Learning

[2602.17658] MARS: Margin-Aware Reward-Modeling with Self-Refinement

The paper presents MARS, a novel margin-aware reward modeling framework that enhances training efficiency by focusing on ambiguous prefer...

arXiv - AI · 3 min ·
[2602.17633] When to Trust the Cheap Check: Weak and Strong Verification for Reasoning
Llms

[2602.17633] When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

The paper discusses the balance between weak and strong verification methods in reasoning with large language models (LLMs), emphasizing ...

arXiv - AI · 3 min ·
[2602.17608] Towards Anytime-Valid Statistical Watermarking
Llms

[2602.17608] Towards Anytime-Valid Statistical Watermarking

The paper presents a novel framework for statistical watermarking in machine-generated content, addressing limitations of existing method...

arXiv - AI · 4 min ·
[2602.17605] Adapting Actively on the Fly: Relevance-Guided Online Meta-Learning with Latent Concepts for Geospatial Discovery
Ai Safety

[2602.17605] Adapting Actively on the Fly: Relevance-Guided Online Meta-Learning with Latent Concepts for Geospatial Discovery

This paper presents a novel framework for geospatial discovery that integrates active learning and online meta-learning, focusing on rele...

arXiv - Machine Learning · 4 min ·
[2602.17586] Conditional Flow Matching for Continuous Anomaly Detection in Autonomous Driving on a Manifold-Aware Spectral Space
Robotics

[2602.17586] Conditional Flow Matching for Continuous Anomaly Detection in Autonomous Driving on a Manifold-Aware Spectral Space

This paper presents Deep-Flow, an innovative framework for anomaly detection in autonomous driving, utilizing Optimal Transport Condition...

arXiv - Machine Learning · 4 min ·
[2602.17532] Systematic Evaluation of Single-Cell Foundation Model Interpretability Reveals Attention Captures Co-Expression Rather Than Unique Regulatory Signal
Llms

[2602.17532] Systematic Evaluation of Single-Cell Foundation Model Interpretability Reveals Attention Captures Co-Expression Rather Than Unique Regulatory Signal

This article evaluates the interpretability of single-cell foundation models, revealing that attention mechanisms capture co-expression r...

arXiv - AI · 3 min ·
[2602.17531] Position: Evaluation of ECG Representations Must Be Fixed
Ai Startups

[2602.17531] Position: Evaluation of ECG Representations Must Be Fixed

This paper critiques current benchmarking practices in 12-lead ECG representation learning, advocating for broader evaluation criteria to...

arXiv - AI · 4 min ·
[2602.17493] Learning with Boolean threshold functions
Machine Learning

[2602.17493] Learning with Boolean threshold functions

This article presents a novel method for training neural networks on Boolean data using Boolean threshold functions (BTF), demonstrating ...

arXiv - AI · 4 min ·
[2602.17452] Jolt Atlas: Verifiable Inference via Lookup Arguments in Zero Knowledge
Machine Learning

[2602.17452] Jolt Atlas: Verifiable Inference via Lookup Arguments in Zero Knowledge

Jolt Atlas introduces a zero-knowledge machine learning framework that enhances inference verification through lookup arguments, optimizi...

arXiv - AI · 4 min ·
[2602.17483] What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data
Llms

[2602.17483] What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data

This article presents a human-centered audit of how large language models (LLMs) associate personal data with individual names, highlight...

arXiv - AI · 3 min ·
[2602.17431] Fine-Grained Uncertainty Quantification for Long-Form Language Model Outputs: A Comparative Study
Llms

[2602.17431] Fine-Grained Uncertainty Quantification for Long-Form Language Model Outputs: A Comparative Study

This study presents a taxonomy for fine-grained uncertainty quantification in long-form language model outputs, highlighting effective me...

arXiv - Machine Learning · 3 min ·
[2602.17423] Convergence Analysis of Two-Layer Neural Networks under Gaussian Input Masking
Machine Learning

[2602.17423] Convergence Analysis of Two-Layer Neural Networks under Gaussian Input Masking

This paper explores the convergence of two-layer neural networks trained with Gaussian masked inputs, demonstrating linear convergence th...

arXiv - AI · 3 min ·
Previous Page 79 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime