AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment
Machine Learning

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Abstract page for arXiv paper 2511.21331: The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

arXiv - AI · 4 min ·
[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?
Llms

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

Abstract page for arXiv paper 2509.22367: What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv - AI · 4 min ·
[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
Machine Learning

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Abstract page for arXiv paper 2507.22264: SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

arXiv - AI · 4 min ·

All Content

[2602.15809] Decision Quality Evaluation Framework at Pinterest
Llms

[2602.15809] Decision Quality Evaluation Framework at Pinterest

The article presents a Decision Quality Evaluation Framework developed at Pinterest to enhance content moderation by evaluating the quali...

arXiv - AI · 3 min ·
[2510.02348] mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations
Nlp

[2510.02348] mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations

The paper introduces mini-vec2vec, an efficient method for aligning text embedding spaces using linear transformations, significantly imp...

arXiv - AI · 3 min ·
[2510.00565] Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
Llms

[2510.00565] Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability

This paper explores vulnerabilities in diffusion language models (DLMs) related to priming attacks and proposes a novel safety alignment ...

arXiv - Machine Learning · 4 min ·
[2509.21961] FlowDrive: moderated flow matching with data balancing for trajectory planning
Machine Learning

[2509.21961] FlowDrive: moderated flow matching with data balancing for trajectory planning

The paper presents FlowDrive, a trajectory planning method that addresses data imbalance in driving datasets by using moderated flow matc...

arXiv - AI · 3 min ·
[2602.15757] Beyond Binary Classification: Detecting Fine-Grained Sexism in Social Media Videos
Nlp

[2602.15757] Beyond Binary Classification: Detecting Fine-Grained Sexism in Social Media Videos

The paper presents FineMuSe, a new dataset for detecting nuanced sexism in social media videos, addressing the limitations of binary clas...

arXiv - AI · 3 min ·
[2509.21609] VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment
Computer Vision

[2509.21609] VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment

The paper presents VLCE, a framework that enhances image description for disaster assessment by integrating external semantic knowledge, ...

arXiv - Machine Learning · 4 min ·
[2509.16779] Improving User Interface Generation Models from Designer Feedback
Llms

[2509.16779] Improving User Interface Generation Models from Designer Feedback

This paper explores enhancing user interface (UI) generation models by incorporating designer feedback, demonstrating improved performanc...

arXiv - Machine Learning · 4 min ·
[2602.15698] How to Disclose? Strategic AI Disclosure in Crowdfunding
Ai Startups

[2602.15698] How to Disclose? Strategic AI Disclosure in Crowdfunding

The article examines the impact of strategic AI disclosure in crowdfunding, revealing that mandatory disclosure can significantly reduce ...

arXiv - AI · 4 min ·
[2602.15689] A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models
Llms

[2602.15689] A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models

This paper presents a content-based framework for cybersecurity refusal decisions in large language models, emphasizing the need for expl...

arXiv - AI · 3 min ·
[2602.15684] Estimating Human Muscular Fatigue in Dynamic Collaborative Robotic Tasks with Learning-Based Models
Machine Learning

[2602.15684] Estimating Human Muscular Fatigue in Dynamic Collaborative Robotic Tasks with Learning-Based Models

This article presents a data-driven framework for estimating human muscular fatigue during collaborative robotic tasks using machine lear...

arXiv - AI · 4 min ·
[2504.14696] Reveal-or-Obscure: A Differentially Private Sampling Algorithm for Discrete Distributions
Machine Learning

[2504.14696] Reveal-or-Obscure: A Differentially Private Sampling Algorithm for Discrete Distributions

The paper presents a differentially private sampling algorithm, Reveal-or-Obscure (ROO), for generating samples from discrete distributio...

arXiv - Machine Learning · 4 min ·
[2602.15654] Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections
Llms

[2602.15654] Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections

This paper discusses the security vulnerabilities of self-evolving LLM agents, introducing the concept of 'Zombie Agents' that can be cov...

arXiv - AI · 4 min ·
[2602.15620] STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens
Llms

[2602.15620] STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens

The paper presents STAPO, a novel approach to stabilize reinforcement learning in large language models by silencing rare spurious tokens...

arXiv - AI · 4 min ·
[2602.15549] VLM-DEWM: Dynamic External World Model for Verifiable and Resilient Vision-Language Planning in Manufacturing
Llms

[2602.15549] VLM-DEWM: Dynamic External World Model for Verifiable and Resilient Vision-Language Planning in Manufacturing

The paper introduces VLM-DEWM, a novel cognitive architecture designed to enhance vision-language planning in manufacturing by addressing...

arXiv - AI · 4 min ·
[2602.00834] Don't Forget Its Variance! The Minimum Path Variance Principle for Accurate and Stable Score-Based Models
Machine Learning

[2602.00834] Don't Forget Its Variance! The Minimum Path Variance Principle for Accurate and Stable Score-Based Models

This paper introduces the Minimum Path Variance (MinPV) Principle, addressing the paradox of score-based methods in machine learning by m...

arXiv - Machine Learning · 3 min ·
[2602.00240] Green-NAS: A Global-Scale Multi-Objective Neural Architecture Search for Robust and Efficient Edge-Native Weather Forecasting
Machine Learning

[2602.00240] Green-NAS: A Global-Scale Multi-Objective Neural Architecture Search for Robust and Efficient Edge-Native Weather Forecasting

Green-NAS presents a multi-objective neural architecture search framework aimed at optimizing weather forecasting models for low-resource...

arXiv - Machine Learning · 4 min ·
[2602.15485] SecCodeBench-V2 Technical Report
Llms

[2602.15485] SecCodeBench-V2 Technical Report

SecCodeBench-V2 is a benchmark for evaluating LLMs' ability to generate secure code, featuring 98 scenarios across five programming langu...

arXiv - AI · 4 min ·
[2602.15439] Algorithmic Approaches to Opinion Selection for Online Deliberation: A Comparative Study
Ai Agents

[2602.15439] Algorithmic Approaches to Opinion Selection for Online Deliberation: A Comparative Study

This article examines various algorithmic approaches to opinion selection in online deliberation, highlighting the trade-offs between div...

arXiv - AI · 3 min ·
[2601.01297] ARGUS: Adaptive Rotation-Invariant Geometric Unsupervised System
Data Science

[2601.01297] ARGUS: Adaptive Rotation-Invariant Geometric Unsupervised System

The paper introduces ARGUS, a novel framework for detecting distributional drift in high-dimensional data streams, emphasizing geometric ...

arXiv - AI · 4 min ·
[2602.15376] A Unified Evaluation of Learning-Based Similarity Techniques for Malware Detection
Ai Startups

[2602.15376] A Unified Evaluation of Learning-Based Similarity Techniques for Malware Detection

This paper presents a systematic evaluation of learning-based similarity techniques for malware detection, comparing various methods unde...

arXiv - AI · 4 min ·
Previous Page 92 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime