AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2512.21106] Semantic Refinement with LLMs for Graph Representations
Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min ·
[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology
Machine Learning

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

Abstract page for arXiv paper 2511.22294: Structure is Supervision: Multiview Masked Autoencoders for Radiology

arXiv - Machine Learning · 4 min ·
[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
Llms

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Abstract page for arXiv paper 2511.18123: Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-La...

arXiv - Machine Learning · 4 min ·

All Content

[2602.21420] Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning
Llms

[2602.21420] Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

This paper introduces the Asymmetric Confidence-aware Error Penalty (ACE) to enhance reinforcement learning by addressing overconfident e...

arXiv - Machine Learning · 4 min ·
[2602.21374] Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages
Llms

[2602.21374] Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages

This study explores the use of small language models for extracting clinical information from low-resource languages, focusing on a priva...

arXiv - Machine Learning · 4 min ·
[2602.21372] The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging
Machine Learning

[2602.21372] The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging

This article presents an entropy-adaptive model merging technique for medical imaging that addresses challenges posed by heterogeneous do...

arXiv - Machine Learning · 4 min ·
[2602.21368] Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration
Ai Infrastructure

[2602.21368] Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration

This paper presents a method for certifying the reliability of black-box AI systems using self-consistency sampling and conformal calibra...

arXiv - Machine Learning · 3 min ·
[2602.21346] Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment
Llms

[2602.21346] Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment

This article presents a novel approach to enhance safety alignment in large language models (LLMs) through Alignment-Weighted Direct Pref...

arXiv - AI · 4 min ·
[2602.21327] Equitable Evaluation via Elicitation
Ai Startups

[2602.21327] Equitable Evaluation via Elicitation

The paper discusses an AI-driven approach for equitable skill evaluation, addressing biases in self-presentation among job seekers. It pr...

arXiv - Machine Learning · 3 min ·
[2602.21269] Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space
Llms

[2602.21269] Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space

The paper introduces Group Orthogonalized Policy Optimization (GOPO), a novel algorithm for aligning large language models using Hilbert ...

arXiv - Machine Learning · 4 min ·
[2602.21267] A Systematic Review of Algorithmic Red Teaming Methodologies for Assurance and Security of AI Applications
Ai Safety

[2602.21267] A Systematic Review of Algorithmic Red Teaming Methodologies for Assurance and Security of AI Applications

This systematic review explores automated red teaming methodologies for enhancing the security of AI applications, addressing the limitat...

arXiv - AI · 3 min ·
[2602.21231] ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces
Llms

[2602.21231] ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces

The paper presents ACAR, a framework for adaptive complexity routing in multi-model ensembles, demonstrating improved task routing accura...

arXiv - Machine Learning · 4 min ·
[2602.21226] IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions
Llms

[2602.21226] IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions

The paper introduces IslamicLegalBench, a benchmark for evaluating LLMs' reasoning on Islamic law, revealing significant limitations in c...

arXiv - AI · 4 min ·
[2602.21218] EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors
Llms

[2602.21218] EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors

The paper introduces EPSVec, a novel method for generating synthetic data using dataset vectors, enhancing privacy and efficiency in mach...

arXiv - Machine Learning · 4 min ·
[2602.21217] Applied Sociolinguistic AI for Community Development (ASA-CD): A New Scientific Paradigm for Linguistically-Grounded Social Intervention
Nlp

[2602.21217] Applied Sociolinguistic AI for Community Development (ASA-CD): A New Scientific Paradigm for Linguistically-Grounded Social Intervention

The paper introduces Applied Sociolinguistic AI for Community Development (ASA-CD), a paradigm that leverages AI and linguistics to addre...

arXiv - AI · 3 min ·
[2602.21215] Inference-time Alignment via Sparse Junction Steering
Llms

[2602.21215] Inference-time Alignment via Sparse Junction Steering

This paper presents Sparse Inference-time Alignment (SIA), a novel approach to enhance alignment in large language models by intervening ...

arXiv - AI · 4 min ·
[2602.22070] Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts
Llms

[2602.22070] Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

This study explores how large language models (LLMs) exhibit inconsistent biases towards algorithmic agents and human experts in decision...

arXiv - AI · 4 min ·
[2602.22094] Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning
Ai Agents

[2602.22094] Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning

This paper presents a novel approach using Petri nets to identify infeasibilities in sequential task planning, enhancing robustness and e...

arXiv - AI · 3 min ·
[2602.21889] 2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support
Machine Learning

[2602.21889] 2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support

The paper presents the 2-Step Agent framework, which models the interaction between decision makers and AI decision support systems, high...

arXiv - Machine Learning · 3 min ·
[2602.21857] Distill and Align Decomposition for Enhanced Claim Verification
Ai Safety

[2602.21857] Distill and Align Decomposition for Enhanced Claim Verification

This paper presents a novel reinforcement learning approach to enhance claim verification by optimizing decomposition quality and verifie...

arXiv - Machine Learning · 3 min ·
[2602.21746] fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation
Machine Learning

[2602.21746] fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation

The paper presents fEDM+, an enhanced fuzzy ethical decision-making framework that improves explainability and validation by integrating ...

arXiv - AI · 4 min ·
[2602.21745] The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems
Machine Learning

[2602.21745] The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems

The ASIR Courage Model presents a phase-dynamic framework for understanding truth transitions in both human and AI systems, emphasizing t...

arXiv - AI · 4 min ·
[2602.21556] Power and Limitations of Aggregation in Compound AI Systems
Machine Learning

[2602.21556] Power and Limitations of Aggregation in Compound AI Systems

The paper explores the effectiveness of aggregating outputs from multiple AI models in compound AI systems, examining its potential to en...

arXiv - AI · 4 min ·
Previous Page 50 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime