AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Llms

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

We're releasing a paper on a new framework for reading and interpreting the internal cognitive states of large language models: "The Lyra...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] If you're building AI agents, logs aren't enough. You need evidence.

I have built a programmable governance layer for AI agents. I am considering to open source completely. Looking for feedback. Agent demos...

Reddit - Machine Learning · 1 min ·
[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis
Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min ·

All Content

[2602.12968] RGAlign-Rec: Ranking-Guided Alignment for Latent Query Reasoning in Recommendation Systems
Llms

[2602.12968] RGAlign-Rec: Ranking-Guided Alignment for Latent Query Reasoning in Recommendation Systems

The RGAlign-Rec framework enhances proactive intent prediction in e-commerce chatbots by aligning latent query reasoning with ranking obj...

arXiv - AI · 4 min ·
[2602.12917] Ultrasound-Guided Real-Time Spinal Motion Visualization for Spinal Instability Assessment
Data Science

[2602.12917] Ultrasound-Guided Real-Time Spinal Motion Visualization for Spinal Instability Assessment

This article presents a novel ultrasound-guided method for real-time 3D visualization of spinal motion to assess spinal instability, aimi...

arXiv - AI · 4 min ·
[2602.12902] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions
Machine Learning

[2602.12902] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions

This paper evaluates the robustness of object detection models used in autonomous vehicles under adverse weather conditions, proposing a ...

arXiv - Machine Learning · 4 min ·
[2602.12892] RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training
Llms

[2602.12892] RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training

The paper presents RADAR, a novel evaluation framework for Multi-modal Large Language Models (MLLMs) that addresses performance bottlenec...

arXiv - AI · 4 min ·
[2602.12873] Knowledge-Based Design Requirements for Generative Social Robots in Higher Education
Llms

[2602.12873] Knowledge-Based Design Requirements for Generative Social Robots in Higher Education

The article explores design requirements for generative social robots in higher education, emphasizing the need for knowledge-based frame...

arXiv - AI · 3 min ·
[2602.12851] Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence
Machine Learning

[2602.12851] Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence

The paper presents Chimera, a framework that integrates neuro-symbolic attention mechanisms into programmable dataplanes, enhancing traff...

arXiv - AI · 3 min ·
[2602.12846] Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models
Llms

[2602.12846] Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models

The paper presents Amortized Reasoning Tree Search (ARTS), a novel approach to enhance reasoning in Large Language Models by decoupling p...

arXiv - Machine Learning · 4 min ·
[2602.12806] RAT-Bench: A Comprehensive Benchmark for Text Anonymization
Llms

[2602.12806] RAT-Bench: A Comprehensive Benchmark for Text Anonymization

RAT-Bench introduces a comprehensive benchmark for evaluating text anonymization tools based on their effectiveness in preventing re-iden...

arXiv - Machine Learning · 4 min ·
[2602.12783] SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise
Nlp

[2602.12783] SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

The paper introduces SQuTR, a new benchmark for evaluating the robustness of spoken query retrieval systems under various acoustic noise ...

arXiv - AI · 4 min ·
[2602.12705] MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
Llms

[2602.12705] MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

MedXIAOHE is a medical vision-language foundation model that enhances medical understanding and reasoning in clinical applications, achie...

arXiv - AI · 3 min ·
[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models
Llms

[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

The paper introduces IndicFairFace, a balanced dataset aimed at addressing geographical bias in Vision-Language Models (VLMs) by represen...

arXiv - AI · 4 min ·
[2602.12630] TensorCommitments: A Lightweight Verifiable Inference for Language Models
Llms

[2602.12630] TensorCommitments: A Lightweight Verifiable Inference for Language Models

The paper introduces TensorCommitments, a novel proof-of-inference scheme designed to enhance the security of large language model (LLM) ...

arXiv - AI · 3 min ·
[2602.12592] Power Interpretable Causal ODE Networks: A Unified Model for Explainable Anomaly Detection and Root Cause Analysis in Power Systems
Machine Learning

[2602.12592] Power Interpretable Causal ODE Networks: A Unified Model for Explainable Anomaly Detection and Root Cause Analysis in Power Systems

The paper presents Power Interpretable Causal ODE Networks (PICODE), a novel model for explainable anomaly detection and root cause analy...

arXiv - Machine Learning · 4 min ·
[2602.12500] Favia: Forensic Agent for Vulnerability-fix Identification and Analysis
Llms

[2602.12500] Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

The paper presents Favia, a forensic agent designed to identify and analyze vulnerability-fixing commits in software repositories, improv...

arXiv - AI · 4 min ·
[2602.12476] Not a Silver Bullet for Loneliness: How Attachment and Age Shape Intimacy with AI Companions
Ai Agents

[2602.12476] Not a Silver Bullet for Loneliness: How Attachment and Age Shape Intimacy with AI Companions

This article explores how attachment styles and age influence the intimacy users develop with AI companions, challenging the notion that ...

arXiv - AI · 4 min ·
[2602.12463] Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof
Ai Safety

[2602.12463] Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof

This paper examines the relationship between correctness in mathematical proofs and their epistemic value, arguing that formal correctnes...

arXiv - AI · 3 min ·
[2602.12444] Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models
Machine Learning

[2602.12444] Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models

This paper presents a novel recovery-based shielding framework for safe reinforcement learning (RL) using Gaussian process dynamics model...

arXiv - AI · 3 min ·
[2602.12430] Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
Llms

[2602.12430] Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

This paper discusses the evolution of large language models (LLMs) into modular agents equipped with skills, emphasizing architecture, ac...

arXiv - AI · 4 min ·
[2602.12413] Soft Contamination Means Benchmarks Test Shallow Generalization
Llms

[2602.12413] Soft Contamination Means Benchmarks Test Shallow Generalization

This paper explores how soft contamination in training data affects the evaluation of large language models (LLMs) on benchmarks, reveali...

arXiv - Machine Learning · 3 min ·
[2602.12384] Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment
Machine Learning

[2602.12384] Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment

This paper explores the mechanisms behind the implicit bias in gradient-based training of deep networks, focusing on the scaling and alig...

arXiv - AI · 4 min ·
Previous Page 121 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime