AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2601.15356] Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing
Llms

[2601.15356] Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing

Abstract page for arXiv paper 2601.15356: Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing

arXiv - AI · 4 min ·
[2510.18196] Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge
Llms

[2510.18196] Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

Abstract page for arXiv paper 2510.18196: Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

arXiv - AI · 3 min ·
[2509.23435] AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models
Llms

[2509.23435] AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

Abstract page for arXiv paper 2509.23435: AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

arXiv - AI · 4 min ·

All Content

[2602.12851] Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence
Machine Learning

[2602.12851] Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence

The paper presents Chimera, a framework that integrates neuro-symbolic attention mechanisms into programmable dataplanes, enhancing traff...

arXiv - AI · 3 min ·
[2602.12846] Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models
Llms

[2602.12846] Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models

The paper presents Amortized Reasoning Tree Search (ARTS), a novel approach to enhance reasoning in Large Language Models by decoupling p...

arXiv - Machine Learning · 4 min ·
[2602.12806] RAT-Bench: A Comprehensive Benchmark for Text Anonymization
Llms

[2602.12806] RAT-Bench: A Comprehensive Benchmark for Text Anonymization

RAT-Bench introduces a comprehensive benchmark for evaluating text anonymization tools based on their effectiveness in preventing re-iden...

arXiv - Machine Learning · 4 min ·
[2602.12783] SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise
Nlp

[2602.12783] SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

The paper introduces SQuTR, a new benchmark for evaluating the robustness of spoken query retrieval systems under various acoustic noise ...

arXiv - AI · 4 min ·
[2602.12705] MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
Llms

[2602.12705] MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

MedXIAOHE is a medical vision-language foundation model that enhances medical understanding and reasoning in clinical applications, achie...

arXiv - AI · 3 min ·
[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models
Llms

[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

The paper introduces IndicFairFace, a balanced dataset aimed at addressing geographical bias in Vision-Language Models (VLMs) by represen...

arXiv - AI · 4 min ·
[2602.12630] TensorCommitments: A Lightweight Verifiable Inference for Language Models
Llms

[2602.12630] TensorCommitments: A Lightweight Verifiable Inference for Language Models

The paper introduces TensorCommitments, a novel proof-of-inference scheme designed to enhance the security of large language model (LLM) ...

arXiv - AI · 3 min ·
[2602.12592] Power Interpretable Causal ODE Networks: A Unified Model for Explainable Anomaly Detection and Root Cause Analysis in Power Systems
Machine Learning

[2602.12592] Power Interpretable Causal ODE Networks: A Unified Model for Explainable Anomaly Detection and Root Cause Analysis in Power Systems

The paper presents Power Interpretable Causal ODE Networks (PICODE), a novel model for explainable anomaly detection and root cause analy...

arXiv - Machine Learning · 4 min ·
[2602.12500] Favia: Forensic Agent for Vulnerability-fix Identification and Analysis
Llms

[2602.12500] Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

The paper presents Favia, a forensic agent designed to identify and analyze vulnerability-fixing commits in software repositories, improv...

arXiv - AI · 4 min ·
[2602.12476] Not a Silver Bullet for Loneliness: How Attachment and Age Shape Intimacy with AI Companions
Ai Agents

[2602.12476] Not a Silver Bullet for Loneliness: How Attachment and Age Shape Intimacy with AI Companions

This article explores how attachment styles and age influence the intimacy users develop with AI companions, challenging the notion that ...

arXiv - AI · 4 min ·
[2602.12463] Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof
Ai Safety

[2602.12463] Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof

This paper examines the relationship between correctness in mathematical proofs and their epistemic value, arguing that formal correctnes...

arXiv - AI · 3 min ·
[2602.12444] Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models
Machine Learning

[2602.12444] Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models

This paper presents a novel recovery-based shielding framework for safe reinforcement learning (RL) using Gaussian process dynamics model...

arXiv - AI · 3 min ·
[2602.12430] Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
Llms

[2602.12430] Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

This paper discusses the evolution of large language models (LLMs) into modular agents equipped with skills, emphasizing architecture, ac...

arXiv - AI · 4 min ·
[2602.12413] Soft Contamination Means Benchmarks Test Shallow Generalization
Llms

[2602.12413] Soft Contamination Means Benchmarks Test Shallow Generalization

This paper explores how soft contamination in training data affects the evaluation of large language models (LLMs) on benchmarks, reveali...

arXiv - Machine Learning · 3 min ·
[2602.12384] Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment
Machine Learning

[2602.12384] Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment

This paper explores the mechanisms behind the implicit bias in gradient-based training of deep networks, focusing on the scaling and alig...

arXiv - AI · 4 min ·
[2602.12373] Policy4OOD: A Knowledge-Guided World Model for Policy Intervention Simulation against the Opioid Overdose Crisis
Machine Learning

[2602.12373] Policy4OOD: A Knowledge-Guided World Model for Policy Intervention Simulation against the Opioid Overdose Crisis

The paper presents Policy4OOD, a knowledge-guided world model designed to simulate policy interventions against the opioid overdose crisi...

arXiv - Machine Learning · 4 min ·
[2602.11247] Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection
Llms

[2602.11247] Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection

The paper presents a novel scoring formula, Peak + Accumulation, for detecting multi-turn LLM attack patterns, addressing limitations in ...

arXiv - AI · 4 min ·
[2602.12285] From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness
Llms

[2602.12285] From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness

This article examines how demographic-based persona assignments in large language models (LLMs) can impact agent performance, revealing v...

arXiv - AI · 3 min ·
[2602.13166] Optimal Take-off under Fuzzy Clearances
Ai Safety

[2602.13166] Optimal Take-off under Fuzzy Clearances

This paper discusses a hybrid obstacle avoidance system for unmanned aircraft that combines optimal control with fuzzy logic to improve d...

arXiv - AI · 4 min ·
[2602.13135] Constrained Assumption-Based Argumentation Frameworks
Ai Agents

[2602.13135] Constrained Assumption-Based Argumentation Frameworks

This paper introduces Constrained Assumption-Based Argumentation (CABA), extending traditional Assumption-Based Argumentation frameworks ...

arXiv - AI · 3 min ·
Previous Page 123 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime