AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment
Machine Learning

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Abstract page for arXiv paper 2511.21331: The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

arXiv - AI · 4 min ·
[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?
Llms

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

Abstract page for arXiv paper 2509.22367: What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv - AI · 4 min ·
[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
Machine Learning

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Abstract page for arXiv paper 2507.22264: SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

arXiv - AI · 4 min ·

All Content

[2602.16086] LGQ: Learning Discretization Geometry for Scalable and Stable Image Tokenization
Nlp

[2602.16086] LGQ: Learning Discretization Geometry for Scalable and Stable Image Tokenization

The paper presents LGQ, a novel image tokenizer that learns discretization geometry to enhance scalability and stability in visual genera...

arXiv - Machine Learning · 4 min ·
[2602.16201] Long-Tail Knowledge in Large Language Models: Taxonomy, Mechanisms, Interventions and Implications
Llms

[2602.16201] Long-Tail Knowledge in Large Language Models: Taxonomy, Mechanisms, Interventions and Implications

This paper explores the concept of long-tail knowledge in large language models (LLMs), analyzing its taxonomy, mechanisms of loss, and i...

arXiv - AI · 4 min ·
[2602.16061] Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models
Machine Learning

[2602.16061] Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

This paper presents a novel framework for partial identification of population quantities under missing data, utilizing weak shadow varia...

arXiv - Machine Learning · 4 min ·
[2602.16187] SIT-LMPC: Safe Information-Theoretic Learning Model Predictive Control for Iterative Tasks
Machine Learning

[2602.16187] SIT-LMPC: Safe Information-Theoretic Learning Model Predictive Control for Iterative Tasks

The paper presents SIT-LMPC, a novel algorithm for safe information-theoretic learning model predictive control tailored for robots perfo...

arXiv - AI · 3 min ·
[2602.16154] Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution
Llms

[2602.16154] Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution

The paper presents REMUL, a multi-party reinforcement learning approach that enhances the faithfulness of reasoning in large language mod...

arXiv - AI · 4 min ·
[2602.16136] Retrieval Collapses When AI Pollutes the Web
Llms

[2602.16136] Retrieval Collapses When AI Pollutes the Web

The paper discusses the phenomenon of 'Retrieval Collapse,' where AI-generated content dominates search results, leading to a decline in ...

arXiv - AI · 3 min ·
[2602.15927] Visual Memory Injection Attacks for Multi-Turn Conversations
Llms

[2602.15927] Visual Memory Injection Attacks for Multi-Turn Conversations

This article discusses Visual Memory Injection (VMI) attacks on large vision-language models (LVLMs) in multi-turn conversations, highlig...

arXiv - Machine Learning · 3 min ·
[2602.16109] Federated Graph AGI for Cross-Border Insider Threat Intelligence in Government Financial Schemes
Ai Agents

[2602.16109] Federated Graph AGI for Cross-Border Insider Threat Intelligence in Government Financial Schemes

The paper presents FedGraph-AGI, a federated learning framework designed to enhance cross-border insider threat detection in government f...

arXiv - AI · 4 min ·
[2602.16073] ScenicRules: An Autonomous Driving Benchmark with Multi-Objective Specifications and Abstract Scenarios
Machine Learning

[2602.16073] ScenicRules: An Autonomous Driving Benchmark with Multi-Objective Specifications and Abstract Scenarios

The paper presents ScenicRules, a benchmark for evaluating autonomous driving systems that balances multiple objectives like safety and e...

arXiv - AI · 4 min ·
[2602.16085] Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs
Llms

[2602.16085] Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs

This article investigates the mental state reasoning of language models (LMs) using 41 open-weight models, revealing insights into their ...

arXiv - AI · 4 min ·
[2602.16033] Transforming GenAI Policy to Prompting Instruction: An RCT of Scalable Prompting Interventions in a CS1 Course
Nlp

[2602.16033] Transforming GenAI Policy to Prompting Instruction: An RCT of Scalable Prompting Interventions in a CS1 Course

This article presents a randomized controlled trial (RCT) examining scalable prompting interventions in a CS1 course, highlighting the im...

arXiv - AI · 4 min ·
[2602.15894] Quality-constrained Entropy Maximization Policy Optimization for LLM Diversity
Llms

[2602.15894] Quality-constrained Entropy Maximization Policy Optimization for LLM Diversity

This paper presents Quality-constrained Entropy Maximization Policy Optimization (QEMPO), a method to enhance diversity in large language...

arXiv - Machine Learning · 3 min ·
[2602.15893] Statistical-Geometric Degeneracy in UAV Search: A Physics-Aware Asymmetric Filtering Approach
Ai Safety

[2602.15893] Statistical-Geometric Degeneracy in UAV Search: A Physics-Aware Asymmetric Filtering Approach

This article presents a novel approach to UAV search operations in post-disaster scenarios, addressing the challenges posed by Non-Line-o...

arXiv - Machine Learning · 4 min ·
[2602.16019] MedProbCLIP: Probabilistic Adaptation of Vision-Language Foundation Model for Reliable Radiograph-Report Retrieval
Llms

[2602.16019] MedProbCLIP: Probabilistic Adaptation of Vision-Language Foundation Model for Reliable Radiograph-Report Retrieval

The paper presents MedProbCLIP, a probabilistic framework for enhancing the reliability of radiograph-report retrieval using vision-langu...

arXiv - AI · 4 min ·
[2602.15983] ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization
Llms

[2602.15983] ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization

ReLoop introduces a structured approach to improve the reliability of LLM-generated optimization code by addressing silent failures throu...

arXiv - Machine Learning · 4 min ·
[2602.15968] From Reflection to Repair: A Scoping Review of Dataset Documentation Tools
Data Science

[2602.15968] From Reflection to Repair: A Scoping Review of Dataset Documentation Tools

This article presents a scoping review of dataset documentation tools, analyzing motivations behind their design and factors affecting th...

arXiv - AI · 4 min ·
[2602.15959] Position-Aware Scene-Appearance Disentanglement for Bidirectional Photoacoustic Microscopy Registration
Ai Safety

[2602.15959] Position-Aware Scene-Appearance Disentanglement for Bidirectional Photoacoustic Microscopy Registration

This paper presents GPEReg-Net, a novel framework for improving image registration in bidirectional photoacoustic microscopy by disentang...

arXiv - AI · 3 min ·
[2602.15945] From Tool Orchestration to Code Execution: A Study of MCP Design Choices
Machine Learning

[2602.15945] From Tool Orchestration to Code Execution: A Study of MCP Design Choices

This paper explores the design choices of Model Context Protocols (MCPs) and introduces Code Execution MCP (CE-MCP) as a solution to scal...

arXiv - AI · 4 min ·
[2602.16698] Causality is Key for Interpretability Claims to Generalise
Llms

[2602.16698] Causality is Key for Interpretability Claims to Generalise

This paper discusses the importance of causality in interpretability research for large language models, highlighting pitfalls in general...

arXiv - Machine Learning · 4 min ·
[2602.15919] Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability
Machine Learning

[2602.15919] Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability

The paper presents a method for assessing privacy vulnerability in machine learning models using a generalized leverage score, enabling e...

arXiv - Machine Learning · 3 min ·
Previous Page 87 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime