AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2512.20363] Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning
Machine Learning

[2512.20363] Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning

The paper presents Clust-PSI-PFL, a novel framework for personalized federated learning that addresses challenges posed by non-IID data t...

arXiv - AI · 4 min ·
[2501.17860] Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations
Llms

[2501.17860] Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations

This article presents a novel approach to training medical large language models (LLMs) through dialogue-based fine-tuning, improving the...

arXiv - AI · 3 min ·
[2511.04934] Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
Llms

[2511.04934] Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding

The paper discusses the limitations of current unlearning methods in large language models (LLMs), revealing that they fail to effectivel...

arXiv - Machine Learning · 4 min ·
[2510.16703] On the Granularity of Causal Effect Identifiability
Machine Learning

[2510.16703] On the Granularity of Causal Effect Identifiability

This paper explores the concept of causal effect identifiability, focusing on state-based effects and how they can be identifiable even w...

arXiv - AI · 3 min ·
[2510.13205] CleverCatch: A Knowledge-Guided Weak Supervision Model for Fraud Detection
Machine Learning

[2510.13205] CleverCatch: A Knowledge-Guided Weak Supervision Model for Fraud Detection

CleverCatch introduces a knowledge-guided weak supervision model for detecting healthcare fraud, enhancing accuracy and interpretability ...

arXiv - AI · 4 min ·
[2510.11390] Medical Interpretability and Knowledge Maps of Large Language Models
Llms

[2510.11390] Medical Interpretability and Knowledge Maps of Large Language Models

This article presents a systematic study of medical interpretability in Large Language Models (LLMs), exploring how these models process ...

arXiv - AI · 4 min ·
[2510.03734] Cost Efficient Fairness Audit Under Partial Feedback
Machine Learning

[2510.03734] Cost Efficient Fairness Audit Under Partial Feedback

This paper presents a cost-efficient approach to auditing fairness in classifiers under partial feedback, proposing algorithms that outpe...

arXiv - AI · 4 min ·
[2510.03346] KVComm: Enabling Efficient LLM Communication through Selective KV Sharing
Llms

[2510.03346] KVComm: Enabling Efficient LLM Communication through Selective KV Sharing

The paper introduces KVComm, a novel framework for efficient communication between Large Language Models (LLMs) using selective KV pair s...

arXiv - AI · 4 min ·
[2602.11298] Voxtral Realtime
Machine Learning

[2602.11298] Voxtral Realtime

Voxtral Realtime presents a novel streaming automatic speech recognition model achieving offline transcription quality with sub-second la...

arXiv - AI · 5 min ·
[2602.08354] Does Your Reasoning Model Implicitly Know When to Stop Thinking?
Machine Learning

[2602.08354] Does Your Reasoning Model Implicitly Know When to Stop Thinking?

This article explores how large reasoning models (LRMs) can implicitly determine when to stop processing information, introducing a new s...

arXiv - AI · 4 min ·
[2602.08104] Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems
Ai Agents

[2602.08104] Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems

This paper presents a novel framework for interpretable failure analysis in Multi-Agent Reinforcement Learning (MARL) systems, focusing o...

arXiv - Machine Learning · 4 min ·
[2510.00502] Diffusion Alignment as Variational Expectation-Maximization
Machine Learning

[2510.00502] Diffusion Alignment as Variational Expectation-Maximization

The paper introduces Diffusion Alignment as Variational Expectation-Maximization (DAV), a novel framework that optimizes diffusion models...

arXiv - Machine Learning · 3 min ·
[2509.25210] STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting
Ai Safety

[2509.25210] STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting

The paper introduces STCast, an AI-driven framework for adaptive boundary alignment in weather forecasting, enhancing regional forecasts ...

arXiv - AI · 4 min ·
[2602.07754] Humanizing AI Grading: Student-Centered Insights on Fairness, Trust, Consistency and Transparency
Ai Safety

[2602.07754] Humanizing AI Grading: Student-Centered Insights on Fairness, Trust, Consistency and Transparency

This study explores student perceptions of AI grading systems, focusing on fairness, trust, consistency, and transparency in an undergrad...

arXiv - AI · 3 min ·
[2602.03003] Open Problems in Differentiable Social Choice: Learning Mechanisms, Decisions, and Alignment
Machine Learning

[2602.03003] Open Problems in Differentiable Social Choice: Learning Mechanisms, Decisions, and Alignment

The paper discusses differentiable social choice, a framework integrating machine learning with social choice theory, identifying 18 open...

arXiv - Machine Learning · 3 min ·
[2601.18467] OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents
Machine Learning

[2601.18467] OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents

The paper introduces OffSeeker, a model demonstrating that offline training can effectively replace costly online reinforcement learning ...

arXiv - Machine Learning · 4 min ·
[2601.05500] The Illusion of Human AI Parity Under Uncertainty: Navigating Elusive Ground Truth via a Probabilistic Paradigm
Llms

[2601.05500] The Illusion of Human AI Parity Under Uncertainty: Navigating Elusive Ground Truth via a Probabilistic Paradigm

This paper discusses the impact of uncertainty in ground truth evaluations on AI performance assessments, proposing a probabilistic frame...

arXiv - AI · 4 min ·
[2509.21655] DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models
Machine Learning

[2509.21655] DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models

The paper presents DriftLite, a lightweight approach for inference-time scaling of diffusion models, enhancing adaptation to new distribu...

arXiv - Machine Learning · 3 min ·
[2509.20648] Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration
Robotics

[2509.20648] Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration

This paper presents CERMIC, a novel framework for enhancing multi-agent exploration in reinforcement learning by calibrating intrinsic cu...

arXiv - Machine Learning · 4 min ·
[2512.20798] A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents
Robotics

[2512.20798] A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents

This paper introduces a benchmark for evaluating outcome-driven constraint violations in autonomous AI agents, highlighting safety concer...

arXiv - AI · 4 min ·
Previous Page 60 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime