AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min · about 5 hours ago

Machine Learning

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

Abstract page for arXiv paper 2511.22294: Structure is Supervision: Multiview Masked Autoencoders for Radiology

arXiv - Machine Learning · 4 min · about 5 hours ago

Llms

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Abstract page for arXiv paper 2511.18123: Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-La...

arXiv - Machine Learning · 4 min · about 5 hours ago

All Content

Llms

[2602.22146] Provable Last-Iterate Convergence for Multi-Objective Safe LLM Alignment via Optimistic Primal-Dual

This article presents a novel optimistic primal-dual framework for safe reinforcement learning from human feedback (RLHF) in large langua...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22145] When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models

This article explores the phenomenon of 'Cultural Ghosting' in large language models (LLMs), highlighting the systematic erasure of cultu...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22144] NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

The paper presents NoLan, a framework aimed at reducing object hallucinations in Large Vision-Language Models (LVLMs) by dynamically supp...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22072] Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models

This article explores the robustness of Theory of Mind (ToM) in large language models (LLMs) through perturbation tasks, revealing signif...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.21939] Hidden Topics: Measuring Sensitive AI Beliefs with List Experiments

This paper explores how list experiments can be used to uncover hidden beliefs in large language models (LLMs), revealing concerning appr...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21841] Resilient Federated Chain: Transforming Blockchain Consensus into an Active Defense Layer for Federated Learning

The paper presents the Resilient Federated Chain (RFC), a blockchain-enabled framework designed to enhance the security of Federated Lear...

arXiv - AI · 4 min · about 1 month ago

Ai Infrastructure

[2602.21845] xai-cola: A Python library for sparsifying counterfactual explanations

The article introduces xai-cola, an open-source Python library designed to sparsify counterfactual explanations, enhancing interpretabili...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.21829] StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles

The paper introduces StoryMovie, a dataset designed for aligning visual stories with movie scripts and subtitles, enhancing dialogue attr...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.21779] Beyond Static Artifacts: A Forensic Benchmark for Video Deepfake Reasoning in Vision Language Models

This paper introduces a forensic benchmark for evaluating video deepfake reasoning in vision-language models, focusing on temporal incons...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.21765] Generalisation of RLHF under Reward Shift and Clipped KL Regularisation

This paper explores the generalization of Reinforcement Learning from Human Feedback (RLHF) under conditions of reward shift and clipped ...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

[2602.21720] Evaluating the relationship between regularity and learnability in recursive numeral systems using Reinforcement Learning

This article explores the relationship between regularity and learnability in recursive numeral systems using Reinforcement Learning, dem...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.21704] Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models

This paper presents Dynamic Multimodal Activation Steering, a novel approach to mitigate hallucinations in Large Vision-Language Models (...

arXiv - AI · 3 min · about 1 month ago

Ai Safety

[2602.21613] Virtual Biopsy for Intracranial Tumors Diagnosis on MRI

This article presents a novel Virtual Biopsy framework for diagnosing intracranial tumors using MRI, addressing the challenges of traditi...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.21584] Exploring Human-Machine Coexistence in Symmetrical Reality

This paper explores the evolving relationship between humans and AI, proposing a framework for harmonious coexistence termed 'symmetrical...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21543] Enhancing Multilingual Embeddings via Multi-Way Parallel Text Alignment

This paper presents a method for enhancing multilingual embeddings through multi-way parallel text alignment, demonstrating improved cros...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21515] Training Generalizable Collaborative Agents via Strategic Risk Aversion

This paper explores training strategies for collaborative agents, emphasizing strategic risk aversion to enhance generalizability and rob...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.21452] Adversarial Robustness of Deep Learning-Based Thyroid Nodule Segmentation in Ultrasound

This article evaluates the adversarial robustness of deep learning models for thyroid nodule segmentation in ultrasound images, highlight...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21447] Adversarial Intent is a Latent Variable: Stateful Trust Inference for Securing Multimodal Agentic RAG

The paper presents a novel framework, MMA-RAG^T, for enhancing the security of multimodal agentic retrieval-augmented generation systems ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.21442] MINAR: Mechanistic Interpretability for Neural Algorithmic Reasoning

The paper introduces MINAR, a toolbox for mechanistic interpretability in neural algorithmic reasoning, enhancing understanding of GNNs' ...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.21429] Provably Safe Generative Sampling with Constricting Barrier Functions

This paper presents a safety filtering framework for generative models, ensuring generated samples meet hard constraints while minimizing...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 49 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2512.21106] Semantic Refinement with LLMs for Graph Representations

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

All Content

[2602.22146] Provable Last-Iterate Convergence for Multi-Objective Safe LLM Alignment via Optimistic Primal-Dual

[2602.22145] When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models

[2602.22144] NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

[2602.22072] Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models

[2602.21939] Hidden Topics: Measuring Sensitive AI Beliefs with List Experiments

[2602.21841] Resilient Federated Chain: Transforming Blockchain Consensus into an Active Defense Layer for Federated Learning

[2602.21845] xai-cola: A Python library for sparsifying counterfactual explanations

[2602.21829] StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles

[2602.21779] Beyond Static Artifacts: A Forensic Benchmark for Video Deepfake Reasoning in Vision Language Models

[2602.21765] Generalisation of RLHF under Reward Shift and Clipped KL Regularisation

[2602.21720] Evaluating the relationship between regularity and learnability in recursive numeral systems using Reinforcement Learning

[2602.21704] Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models

[2602.21613] Virtual Biopsy for Intracranial Tumors Diagnosis on MRI

[2602.21584] Exploring Human-Machine Coexistence in Symmetrical Reality

[2602.21543] Enhancing Multilingual Embeddings via Multi-Way Parallel Text Alignment

[2602.21515] Training Generalizable Collaborative Agents via Strategic Risk Aversion

[2602.21452] Adversarial Robustness of Deep Learning-Based Thyroid Nodule Segmentation in Ultrasound

[2602.21447] Adversarial Intent is a Latent Variable: Stateful Trust Inference for Securing Multimodal Agentic RAG

[2602.21442] MINAR: Mechanistic Interpretability for Neural Algorithmic Reasoning

[2602.21429] Provably Safe Generative Sampling with Constricting Barrier Functions

Related Topics

Stay updated with AI News