AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[2601.15356] Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing

Abstract page for arXiv paper 2601.15356: Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing

arXiv - AI · 4 min · about 2 hours ago

Llms

[2510.18196] Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

Abstract page for arXiv paper 2510.18196: Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

arXiv - AI · 3 min · about 2 hours ago

Llms

[2509.23435] AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

Abstract page for arXiv paper 2509.23435: AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

arXiv - AI · 4 min · about 2 hours ago

All Content

Machine Learning

[2602.12851] Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence

The paper presents Chimera, a framework that integrates neuro-symbolic attention mechanisms into programmable dataplanes, enhancing traff...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.12846] Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models

The paper presents Amortized Reasoning Tree Search (ARTS), a novel approach to enhance reasoning in Large Language Models by decoupling p...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.12806] RAT-Bench: A Comprehensive Benchmark for Text Anonymization

RAT-Bench introduces a comprehensive benchmark for evaluating text anonymization tools based on their effectiveness in preventing re-iden...

arXiv - Machine Learning · 4 min · about 2 months ago

Nlp

[2602.12783] SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

The paper introduces SQuTR, a new benchmark for evaluating the robustness of spoken query retrieval systems under various acoustic noise ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.12705] MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

MedXIAOHE is a medical vision-language foundation model that enhances medical understanding and reasoning in clinical applications, achie...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

The paper introduces IndicFairFace, a balanced dataset aimed at addressing geographical bias in Vision-Language Models (VLMs) by represen...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.12630] TensorCommitments: A Lightweight Verifiable Inference for Language Models

The paper introduces TensorCommitments, a novel proof-of-inference scheme designed to enhance the security of large language model (LLM) ...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.12592] Power Interpretable Causal ODE Networks: A Unified Model for Explainable Anomaly Detection and Root Cause Analysis in Power Systems

The paper presents Power Interpretable Causal ODE Networks (PICODE), a novel model for explainable anomaly detection and root cause analy...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.12500] Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

The paper presents Favia, a forensic agent designed to identify and analyze vulnerability-fixing commits in software repositories, improv...

arXiv - AI · 4 min · about 2 months ago

Ai Agents

[2602.12476] Not a Silver Bullet for Loneliness: How Attachment and Age Shape Intimacy with AI Companions

This article explores how attachment styles and age influence the intimacy users develop with AI companions, challenging the notion that ...

arXiv - AI · 4 min · about 2 months ago

Ai Safety

[2602.12463] Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof

This paper examines the relationship between correctness in mathematical proofs and their epistemic value, arguing that formal correctnes...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.12444] Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models

This paper presents a novel recovery-based shielding framework for safe reinforcement learning (RL) using Gaussian process dynamics model...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.12430] Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

This paper discusses the evolution of large language models (LLMs) into modular agents equipped with skills, emphasizing architecture, ac...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.12413] Soft Contamination Means Benchmarks Test Shallow Generalization

This paper explores how soft contamination in training data affects the evaluation of large language models (LLMs) on benchmarks, reveali...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.12384] Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment

This paper explores the mechanisms behind the implicit bias in gradient-based training of deep networks, focusing on the scaling and alig...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.12373] Policy4OOD: A Knowledge-Guided World Model for Policy Intervention Simulation against the Opioid Overdose Crisis

The paper presents Policy4OOD, a knowledge-guided world model designed to simulate policy interventions against the opioid overdose crisi...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.11247] Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection

The paper presents a novel scoring formula, Peak + Accumulation, for detecting multi-turn LLM attack patterns, addressing limitations in ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.12285] From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness

This article examines how demographic-based persona assignments in large language models (LLMs) can impact agent performance, revealing v...

arXiv - AI · 3 min · about 2 months ago

Ai Safety

[2602.13166] Optimal Take-off under Fuzzy Clearances

This paper discusses a hybrid obstacle avoidance system for unmanned aircraft that combines optimal control with fuzzy logic to improve d...

arXiv - AI · 4 min · about 2 months ago

Ai Agents

[2602.13135] Constrained Assumption-Based Argumentation Frameworks

This paper introduces Constrained Assumption-Based Argumentation (CABA), extending traditional Assumption-Based Argumentation frameworks ...

arXiv - AI · 3 min · about 2 months ago

Previous Page 123 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2601.15356] Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing

[2510.18196] Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

[2509.23435] AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

All Content

[2602.12851] Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence

[2602.12846] Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models

[2602.12806] RAT-Bench: A Comprehensive Benchmark for Text Anonymization

[2602.12783] SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

[2602.12705] MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

[2602.12630] TensorCommitments: A Lightweight Verifiable Inference for Language Models

[2602.12592] Power Interpretable Causal ODE Networks: A Unified Model for Explainable Anomaly Detection and Root Cause Analysis in Power Systems

[2602.12500] Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

[2602.12476] Not a Silver Bullet for Loneliness: How Attachment and Age Shape Intimacy with AI Companions

[2602.12463] Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof

[2602.12444] Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models

[2602.12430] Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

[2602.12413] Soft Contamination Means Benchmarks Test Shallow Generalization

[2602.12384] Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment

[2602.12373] Policy4OOD: A Knowledge-Guided World Model for Policy Intervention Simulation against the Opioid Overdose Crisis

[2602.11247] Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection

[2602.12285] From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness

[2602.13166] Optimal Take-off under Fuzzy Clearances

[2602.13135] Constrained Assumption-Based Argumentation Frameworks

Related Topics

Stay updated with AI News