AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

China drafts law regulating 'digital humans' and banning addictive virtual services for children

A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...

Reddit - Artificial Intelligence · 1 min ·
[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion
Generative Ai

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Abstract page for arXiv paper 2512.00408: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

arXiv - AI · 3 min ·
[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Llms

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

arXiv - AI · 4 min ·

All Content

[2602.15278] Visual Persuasion: What Influences Decisions of Vision-Language Models?
Llms

[2602.15278] Visual Persuasion: What Influences Decisions of Vision-Language Models?

This article explores how visual-language models (VLMs) make decisions based on image inputs, introducing a framework to analyze their pr...

arXiv - AI · 4 min ·
[2504.15206] How Global Calibration Strengthens Multiaccuracy
Machine Learning

[2504.15206] How Global Calibration Strengthens Multiaccuracy

This article explores how global calibration enhances multiaccuracy in machine learning, revealing its potential to improve predictive fa...

arXiv - Machine Learning · 4 min ·
[2602.15265] From Diagnosis to Inoculation: Building Cognitive Resistance to AI Disempowerment
Ai Safety

[2602.15265] From Diagnosis to Inoculation: Building Cognitive Resistance to AI Disempowerment

This article discusses the need for cognitive resistance to AI disempowerment, proposing an AI literacy framework based on pedagogical in...

arXiv - AI · 4 min ·
[2502.13022] Efficient and Sharp Off-Policy Learning under Unobserved Confounding
Ai Safety

[2502.13022] Efficient and Sharp Off-Policy Learning under Unobserved Confounding

This paper presents a novel method for off-policy learning that addresses unobserved confounding, enhancing the accuracy of policy learni...

arXiv - Machine Learning · 4 min ·
[2502.03576] Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation
Robotics

[2502.03576] Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation

This article presents a theoretical framework for clone-robust weighting functions in metric spaces, addressing redundancy bias in benchm...

arXiv - Machine Learning · 4 min ·
[2501.10466] Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction
Machine Learning

[2501.10466] Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction

This paper presents a novel approach to enhance semi-supervised adversarial training (SSAT) by employing latent clustering-based data red...

arXiv - AI · 4 min ·
[2412.20987] RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses
Machine Learning

[2412.20987] RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

The paper 'RobustBlack' explores the effectiveness of black-box adversarial attacks against state-of-the-art defenses, revealing signific...

arXiv - Machine Learning · 3 min ·
[2602.15198] Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems
Llms

[2602.15198] Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

The paper introduces Colosseum, a framework designed to audit collusion in cooperative multi-agent systems, highlighting the risks of age...

arXiv - AI · 3 min ·
[2406.03862] Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
Robotics

[2406.03862] Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation

This paper explores behavior-targeted attacks on reinforcement learning systems and proposes a novel defense strategy using time-discount...

arXiv - AI · 3 min ·
[2405.21012] IGC-Net for conditional average potential outcome estimation over time
Nlp

[2405.21012] IGC-Net for conditional average potential outcome estimation over time

The paper introduces IGC-Net, a novel neural model designed for estimating conditional average potential outcomes (CAPOs) over time, addr...

arXiv - Machine Learning · 4 min ·
[2602.15830] Ensemble-size-dependence of deep-learning post-processing methods that minimize an (un)fair score: motivating examples and a proof-of-concept solution
Machine Learning

[2602.15830] Ensemble-size-dependence of deep-learning post-processing methods that minimize an (un)fair score: motivating examples and a proof-of-concept solution

This paper explores the ensemble-size dependence of deep-learning post-processing methods aimed at minimizing unfair scores in ensemble f...

arXiv - Machine Learning · 4 min ·
[2602.15756] A Note on Non-Composability of Layerwise Approximate Verification for Neural Inference
Machine Learning

[2602.15756] A Note on Non-Composability of Layerwise Approximate Verification for Neural Inference

This paper discusses the limitations of layerwise approximate verification in neural inference, presenting a counterexample that challeng...

arXiv - Machine Learning · 3 min ·
[2602.15064] Structural Divergence Between AI-Agent and Human Social Networks in Moltbook
Ai Agents

[2602.15064] Structural Divergence Between AI-Agent and Human Social Networks in Moltbook

This article explores the structural differences between AI-agent and human social networks on the Moltbook platform, revealing unique in...

arXiv - AI · 3 min ·
[2602.15061] Safe-SDL:Establishing Safety Boundaries and Control Mechanisms for AI-Driven Self-Driving Laboratories
Robotics

[2602.15061] Safe-SDL:Establishing Safety Boundaries and Control Mechanisms for AI-Driven Self-Driving Laboratories

The paper presents Safe-SDL, a framework for ensuring safety in AI-driven Self-Driving Laboratories, addressing the critical 'Syntax-to-S...

arXiv - AI · 4 min ·
[2602.15568] Scenario Approach with Post-Design Certification of User-Specified Properties
Data Science

[2602.15568] Scenario Approach with Post-Design Certification of User-Specified Properties

This paper introduces a scenario approach for post-design certification of user-specified properties, enhancing reliability without addit...

arXiv - Machine Learning · 3 min ·
[2602.15552] Latent Regularization in Generative Test Input Generation
Machine Learning

[2602.15552] Latent Regularization in Generative Test Input Generation

This paper explores the effects of latent space regularization on the quality of generative test inputs for deep learning classifiers, de...

arXiv - Machine Learning · 3 min ·
[2602.15055] Beyond Context Sharing: A Unified Agent Communication Protocol (ACP) for Secure, Federated, and Autonomous Agent-to-Agent (A2A) Orchestration
Llms

[2602.15055] Beyond Context Sharing: A Unified Agent Communication Protocol (ACP) for Secure, Federated, and Autonomous Agent-to-Agent (A2A) Orchestration

The paper introduces the Agent Communication Protocol (ACP), a framework for secure and efficient agent-to-agent orchestration, addressin...

arXiv - AI · 3 min ·
[2602.15037] CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis
Llms

[2602.15037] CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis

The paper introduces CircuChain, a benchmark for evaluating large language models (LLMs) in electrical circuit analysis, focusing on thei...

arXiv - AI · 4 min ·
[2602.15423] GaiaFlow: Semantic-Guided Diffusion Tuning for Carbon-Frugal Search
Machine Learning

[2602.15423] GaiaFlow: Semantic-Guided Diffusion Tuning for Carbon-Frugal Search

GaiaFlow presents a novel framework for carbon-efficient search, employing semantic-guided diffusion tuning to balance retrieval accuracy...

arXiv - Machine Learning · 3 min ·
[2602.15785] This human study did not involve human subjects: Validating LLM simulations as behavioral evidence
Llms

[2602.15785] This human study did not involve human subjects: Validating LLM simulations as behavioral evidence

This article discusses the use of large language models (LLMs) as synthetic participants in social science experiments, evaluating their ...

arXiv - AI · 4 min ·
Previous Page 97 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime