AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

We're releasing a paper on a new framework for reading and interpreting the internal cognitive states of large language models: "The Lyra...

Reddit - Machine Learning · 1 min · about 12 hours ago

Machine Learning

[P] If you're building AI agents, logs aren't enough. You need evidence.

I have built a programmable governance layer for AI agents. I am considering to open source completely. Looking for feedback. Agent demos...

Reddit - Machine Learning · 1 min · about 21 hours ago

Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min · 1 day ago

All Content

Llms

[2602.12968] RGAlign-Rec: Ranking-Guided Alignment for Latent Query Reasoning in Recommendation Systems

The RGAlign-Rec framework enhances proactive intent prediction in e-commerce chatbots by aligning latent query reasoning with ranking obj...

arXiv - AI · 4 min · about 2 months ago

Data Science

[2602.12917] Ultrasound-Guided Real-Time Spinal Motion Visualization for Spinal Instability Assessment

This article presents a novel ultrasound-guided method for real-time 3D visualization of spinal motion to assess spinal instability, aimi...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.12902] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions

This paper evaluates the robustness of object detection models used in autonomous vehicles under adverse weather conditions, proposing a ...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.12892] RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training

The paper presents RADAR, a novel evaluation framework for Multi-modal Large Language Models (MLLMs) that addresses performance bottlenec...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.12873] Knowledge-Based Design Requirements for Generative Social Robots in Higher Education

The article explores design requirements for generative social robots in higher education, emphasizing the need for knowledge-based frame...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.12851] Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence

The paper presents Chimera, a framework that integrates neuro-symbolic attention mechanisms into programmable dataplanes, enhancing traff...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.12846] Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models

The paper presents Amortized Reasoning Tree Search (ARTS), a novel approach to enhance reasoning in Large Language Models by decoupling p...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.12806] RAT-Bench: A Comprehensive Benchmark for Text Anonymization

RAT-Bench introduces a comprehensive benchmark for evaluating text anonymization tools based on their effectiveness in preventing re-iden...

arXiv - Machine Learning · 4 min · about 2 months ago

Nlp

[2602.12783] SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

The paper introduces SQuTR, a new benchmark for evaluating the robustness of spoken query retrieval systems under various acoustic noise ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.12705] MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

MedXIAOHE is a medical vision-language foundation model that enhances medical understanding and reasoning in clinical applications, achie...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

The paper introduces IndicFairFace, a balanced dataset aimed at addressing geographical bias in Vision-Language Models (VLMs) by represen...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.12630] TensorCommitments: A Lightweight Verifiable Inference for Language Models

The paper introduces TensorCommitments, a novel proof-of-inference scheme designed to enhance the security of large language model (LLM) ...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.12592] Power Interpretable Causal ODE Networks: A Unified Model for Explainable Anomaly Detection and Root Cause Analysis in Power Systems

The paper presents Power Interpretable Causal ODE Networks (PICODE), a novel model for explainable anomaly detection and root cause analy...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.12500] Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

The paper presents Favia, a forensic agent designed to identify and analyze vulnerability-fixing commits in software repositories, improv...

arXiv - AI · 4 min · about 2 months ago

Ai Agents

[2602.12476] Not a Silver Bullet for Loneliness: How Attachment and Age Shape Intimacy with AI Companions

This article explores how attachment styles and age influence the intimacy users develop with AI companions, challenging the notion that ...

arXiv - AI · 4 min · about 2 months ago

Ai Safety

[2602.12463] Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof

This paper examines the relationship between correctness in mathematical proofs and their epistemic value, arguing that formal correctnes...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.12444] Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models

This paper presents a novel recovery-based shielding framework for safe reinforcement learning (RL) using Gaussian process dynamics model...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.12430] Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

This paper discusses the evolution of large language models (LLMs) into modular agents equipped with skills, emphasizing architecture, ac...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.12413] Soft Contamination Means Benchmarks Test Shallow Generalization

This paper explores how soft contamination in training data affects the evaluation of large language models (LLMs) on benchmarks, reveali...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.12384] Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment

This paper explores the mechanisms behind the implicit bias in gradient-based training of deep networks, focusing on the scaling and alig...

arXiv - AI · 4 min · about 2 months ago

Previous Page 121 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

[P] If you're building AI agents, logs aren't enough. You need evidence.

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

All Content

[2602.12968] RGAlign-Rec: Ranking-Guided Alignment for Latent Query Reasoning in Recommendation Systems

[2602.12917] Ultrasound-Guided Real-Time Spinal Motion Visualization for Spinal Instability Assessment

[2602.12902] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions

[2602.12892] RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training

[2602.12873] Knowledge-Based Design Requirements for Generative Social Robots in Higher Education

[2602.12851] Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence

[2602.12846] Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models

[2602.12806] RAT-Bench: A Comprehensive Benchmark for Text Anonymization

[2602.12783] SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

[2602.12705] MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

[2602.12630] TensorCommitments: A Lightweight Verifiable Inference for Language Models

[2602.12592] Power Interpretable Causal ODE Networks: A Unified Model for Explainable Anomaly Detection and Root Cause Analysis in Power Systems

[2602.12500] Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

[2602.12476] Not a Silver Bullet for Loneliness: How Attachment and Age Shape Intimacy with AI Companions

[2602.12463] Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof

[2602.12444] Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models

[2602.12430] Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

[2602.12413] Soft Contamination Means Benchmarks Test Shallow Generalization

[2602.12384] Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment

Related Topics

Stay updated with AI News