AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min · about 7 hours ago

Machine Learning

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

Abstract page for arXiv paper 2511.22294: Structure is Supervision: Multiview Masked Autoencoders for Radiology

arXiv - Machine Learning · 4 min · about 7 hours ago

Llms

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Abstract page for arXiv paper 2511.18123: Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-La...

arXiv - Machine Learning · 4 min · about 7 hours ago

All Content

Llms

[2602.21420] Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

This paper introduces the Asymmetric Confidence-aware Error Penalty (ACE) to enhance reinforcement learning by addressing overconfident e...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.21374] Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages

This study explores the use of small language models for extracting clinical information from low-resource languages, focusing on a priva...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.21372] The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging

This article presents an entropy-adaptive model merging technique for medical imaging that addresses challenges posed by heterogeneous do...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Infrastructure

[2602.21368] Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration

This paper presents a method for certifying the reliability of black-box AI systems using self-consistency sampling and conformal calibra...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.21346] Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment

This article presents a novel approach to enhance safety alignment in large language models (LLMs) through Alignment-Weighted Direct Pref...

arXiv - AI · 4 min · about 1 month ago

Ai Startups

[2602.21327] Equitable Evaluation via Elicitation

The paper discusses an AI-driven approach for equitable skill evaluation, addressing biases in self-presentation among job seekers. It pr...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.21269] Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space

The paper introduces Group Orthogonalized Policy Optimization (GOPO), a novel algorithm for aligning large language models using Hilbert ...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

[2602.21267] A Systematic Review of Algorithmic Red Teaming Methodologies for Assurance and Security of AI Applications

This systematic review explores automated red teaming methodologies for enhancing the security of AI applications, addressing the limitat...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.21231] ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces

The paper presents ACAR, a framework for adaptive complexity routing in multi-model ensembles, demonstrating improved task routing accura...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.21226] IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions

The paper introduces IslamicLegalBench, a benchmark for evaluating LLMs' reasoning on Islamic law, revealing significant limitations in c...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.21218] EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors

The paper introduces EPSVec, a novel method for generating synthetic data using dataset vectors, enhancing privacy and efficiency in mach...

arXiv - Machine Learning · 4 min · about 1 month ago

Nlp

[2602.21217] Applied Sociolinguistic AI for Community Development (ASA-CD): A New Scientific Paradigm for Linguistically-Grounded Social Intervention

The paper introduces Applied Sociolinguistic AI for Community Development (ASA-CD), a paradigm that leverages AI and linguistics to addre...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.21215] Inference-time Alignment via Sparse Junction Steering

This paper presents Sparse Inference-time Alignment (SIA), a novel approach to enhance alignment in large language models by intervening ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22070] Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

This study explores how large language models (LLMs) exhibit inconsistent biases towards algorithmic agents and human experts in decision...

arXiv - AI · 4 min · about 1 month ago

Ai Agents

[2602.22094] Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning

This paper presents a novel approach using Petri nets to identify infeasibilities in sequential task planning, enhancing robustness and e...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21889] 2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support

The paper presents the 2-Step Agent framework, which models the interaction between decision makers and AI decision support systems, high...

arXiv - Machine Learning · 3 min · about 1 month ago

Ai Safety

[2602.21857] Distill and Align Decomposition for Enhanced Claim Verification

This paper presents a novel reinforcement learning approach to enhance claim verification by optimizing decomposition quality and verifie...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.21746] fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation

The paper presents fEDM+, an enhanced fuzzy ethical decision-making framework that improves explainability and validation by integrating ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21745] The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems

The ASIR Courage Model presents a phase-dynamic framework for understanding truth transitions in both human and AI systems, emphasizing t...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21556] Power and Limitations of Aggregation in Compound AI Systems

The paper explores the effectiveness of aggregating outputs from multiple AI models in compound AI systems, examining its potential to en...

arXiv - AI · 4 min · about 1 month ago

Previous Page 50 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2512.21106] Semantic Refinement with LLMs for Graph Representations

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

All Content

[2602.21420] Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

[2602.21374] Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages

[2602.21372] The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging

[2602.21368] Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration

[2602.21346] Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment

[2602.21327] Equitable Evaluation via Elicitation

[2602.21269] Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space

[2602.21267] A Systematic Review of Algorithmic Red Teaming Methodologies for Assurance and Security of AI Applications

[2602.21231] ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces

[2602.21226] IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions

[2602.21218] EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors

[2602.21217] Applied Sociolinguistic AI for Community Development (ASA-CD): A New Scientific Paradigm for Linguistically-Grounded Social Intervention

[2602.21215] Inference-time Alignment via Sparse Junction Steering

[2602.22070] Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

[2602.22094] Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning

[2602.21889] 2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support

[2602.21857] Distill and Align Decomposition for Enhanced Claim Verification

[2602.21746] fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation

[2602.21745] The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems

[2602.21556] Power and Limitations of Aggregation in Compound AI Systems

Related Topics

Stay updated with AI News