AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment
Machine Learning

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Abstract page for arXiv paper 2511.21331: The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

arXiv - AI · 4 min ·
[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?
Llms

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

Abstract page for arXiv paper 2509.22367: What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv - AI · 4 min ·
[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
Machine Learning

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Abstract page for arXiv paper 2507.22264: SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

arXiv - AI · 4 min ·

All Content

[2405.21012] IGC-Net for conditional average potential outcome estimation over time
Nlp

[2405.21012] IGC-Net for conditional average potential outcome estimation over time

The paper introduces IGC-Net, a novel neural model designed for estimating conditional average potential outcomes (CAPOs) over time, addr...

arXiv - Machine Learning · 4 min ·
[2602.15830] Ensemble-size-dependence of deep-learning post-processing methods that minimize an (un)fair score: motivating examples and a proof-of-concept solution
Machine Learning

[2602.15830] Ensemble-size-dependence of deep-learning post-processing methods that minimize an (un)fair score: motivating examples and a proof-of-concept solution

This paper explores the ensemble-size dependence of deep-learning post-processing methods aimed at minimizing unfair scores in ensemble f...

arXiv - Machine Learning · 4 min ·
[2602.15756] A Note on Non-Composability of Layerwise Approximate Verification for Neural Inference
Machine Learning

[2602.15756] A Note on Non-Composability of Layerwise Approximate Verification for Neural Inference

This paper discusses the limitations of layerwise approximate verification in neural inference, presenting a counterexample that challeng...

arXiv - Machine Learning · 3 min ·
[2602.15064] Structural Divergence Between AI-Agent and Human Social Networks in Moltbook
Ai Agents

[2602.15064] Structural Divergence Between AI-Agent and Human Social Networks in Moltbook

This article explores the structural differences between AI-agent and human social networks on the Moltbook platform, revealing unique in...

arXiv - AI · 3 min ·
[2602.15061] Safe-SDL:Establishing Safety Boundaries and Control Mechanisms for AI-Driven Self-Driving Laboratories
Robotics

[2602.15061] Safe-SDL:Establishing Safety Boundaries and Control Mechanisms for AI-Driven Self-Driving Laboratories

The paper presents Safe-SDL, a framework for ensuring safety in AI-driven Self-Driving Laboratories, addressing the critical 'Syntax-to-S...

arXiv - AI · 4 min ·
[2602.15568] Scenario Approach with Post-Design Certification of User-Specified Properties
Data Science

[2602.15568] Scenario Approach with Post-Design Certification of User-Specified Properties

This paper introduces a scenario approach for post-design certification of user-specified properties, enhancing reliability without addit...

arXiv - Machine Learning · 3 min ·
[2602.15552] Latent Regularization in Generative Test Input Generation
Machine Learning

[2602.15552] Latent Regularization in Generative Test Input Generation

This paper explores the effects of latent space regularization on the quality of generative test inputs for deep learning classifiers, de...

arXiv - Machine Learning · 3 min ·
[2602.15055] Beyond Context Sharing: A Unified Agent Communication Protocol (ACP) for Secure, Federated, and Autonomous Agent-to-Agent (A2A) Orchestration
Llms

[2602.15055] Beyond Context Sharing: A Unified Agent Communication Protocol (ACP) for Secure, Federated, and Autonomous Agent-to-Agent (A2A) Orchestration

The paper introduces the Agent Communication Protocol (ACP), a framework for secure and efficient agent-to-agent orchestration, addressin...

arXiv - AI · 3 min ·
[2602.15037] CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis
Llms

[2602.15037] CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis

The paper introduces CircuChain, a benchmark for evaluating large language models (LLMs) in electrical circuit analysis, focusing on thei...

arXiv - AI · 4 min ·
[2602.15423] GaiaFlow: Semantic-Guided Diffusion Tuning for Carbon-Frugal Search
Machine Learning

[2602.15423] GaiaFlow: Semantic-Guided Diffusion Tuning for Carbon-Frugal Search

GaiaFlow presents a novel framework for carbon-efficient search, employing semantic-guided diffusion tuning to balance retrieval accuracy...

arXiv - Machine Learning · 3 min ·
[2602.15785] This human study did not involve human subjects: Validating LLM simulations as behavioral evidence
Llms

[2602.15785] This human study did not involve human subjects: Validating LLM simulations as behavioral evidence

This article discusses the use of large language models (LLMs) as synthetic participants in social science experiments, evaluating their ...

arXiv - AI · 4 min ·
[2602.15368] GMAIL: Generative Modality Alignment for generated Image Learning
Machine Learning

[2602.15368] GMAIL: Generative Modality Alignment for generated Image Learning

The paper presents GMAIL, a novel framework for aligning generated images with real images in machine learning, enhancing performance in ...

arXiv - Machine Learning · 4 min ·
[2602.15326] SCENE OTA-FD: Self-Centering Noncoherent Estimator for Over-the-Air Federated Distillation
Ai Safety

[2602.15326] SCENE OTA-FD: Self-Centering Noncoherent Estimator for Over-the-Air Federated Distillation

The paper presents SCENE, a novel estimator for over-the-air federated distillation that enhances aggregation without requiring pilot sig...

arXiv - Machine Learning · 3 min ·
[2602.15645] CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving
Llms

[2602.15645] CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving

The article presents CARE Drive, a framework for evaluating the reason-responsiveness of vision language models in automated driving, add...

arXiv - AI · 4 min ·
[2602.15323] Unforgeable Watermarks for Language Models via Robust Signatures
Llms

[2602.15323] Unforgeable Watermarks for Language Models via Robust Signatures

The paper presents a novel watermarking scheme for language models that ensures unforgeability and recoverability, enhancing content prov...

arXiv - Machine Learning · 4 min ·
[2602.15553] RUVA: Personalized Transparent On-Device Graph Reasoning
Nlp

[2602.15553] RUVA: Personalized Transparent On-Device Graph Reasoning

The paper presents RUVA, a novel architecture for personalized on-device graph reasoning that enhances user control over AI-generated con...

arXiv - AI · 3 min ·
[2602.15259] Knowing Isn't Understanding: Re-grounding Generative Proactivity with Epistemic and Behavioral Insight
Generative Ai

[2602.15259] Knowing Isn't Understanding: Re-grounding Generative Proactivity with Epistemic and Behavioral Insight

This paper discusses the limitations of generative AI agents that equate understanding with resolving explicit queries, highlighting the ...

arXiv - Machine Learning · 4 min ·
[2602.15532] Quantifying construct validity in large language model evaluations
Llms

[2602.15532] Quantifying construct validity in large language model evaluations

This paper presents a structured capabilities model to improve the construct validity of large language model (LLM) evaluations, addressi...

arXiv - Machine Learning · 4 min ·
[2602.15252] Decision Making under Imperfect Recall: Algorithms and Benchmarks
Machine Learning

[2602.15252] Decision Making under Imperfect Recall: Algorithms and Benchmarks

This paper presents a benchmark suite for decision-making under imperfect recall in game theory, introducing regret matching algorithms t...

arXiv - Machine Learning · 4 min ·
[2602.15195] Weight space Detection of Backdoors in LoRA Adapters
Llms

[2602.15195] Weight space Detection of Backdoors in LoRA Adapters

This article presents a novel method for detecting backdoors in LoRA adapters by analyzing their weight matrices, achieving high accuracy...

arXiv - Machine Learning · 3 min ·
Previous Page 94 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime