AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

China drafts law regulating 'digital humans' and banning addictive virtual services for children

A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...

Reddit - Artificial Intelligence · 1 min ·
[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion
Generative Ai

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Abstract page for arXiv paper 2512.00408: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

arXiv - AI · 3 min ·
[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Llms

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

arXiv - AI · 4 min ·

All Content

[2602.01696] Cross-Modal Purification and Fusion for Small-Object RGB-D Transmission-Line Defect Detection
Ai Safety

[2602.01696] Cross-Modal Purification and Fusion for Small-Object RGB-D Transmission-Line Defect Detection

This paper presents CMAFNet, a novel network for detecting small defects in transmission lines using RGB-D data, achieving significant pe...

arXiv - AI · 4 min ·
[2602.01023] Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment
Nlp

[2602.01023] Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment

This paper presents a unified framework for Query Auto-Completion (QAC) that integrates Retrieval-Augmented Generation (RAG) and multi-ob...

arXiv - AI · 4 min ·
[2601.20538] Interpreting Emergent Extreme Events in Multi-Agent Systems
Llms

[2601.20538] Interpreting Emergent Extreme Events in Multi-Agent Systems

This paper presents a framework for interpreting emergent extreme events in multi-agent systems, focusing on the origins and drivers of t...

arXiv - AI · 4 min ·
[2511.15315] Robust Bayesian Optimisation with Unbounded Corruptions
Machine Learning

[2511.15315] Robust Bayesian Optimisation with Unbounded Corruptions

The paper introduces RCGP-UCB, a robust Bayesian optimization algorithm designed to handle extreme outliers by allowing unbounded corrupt...

arXiv - Machine Learning · 3 min ·
[2601.15109] An Agentic Operationalization of DISARM for FIMI Investigation on Social Media
Robotics

[2601.15109] An Agentic Operationalization of DISARM for FIMI Investigation on Social Media

This article presents a framework-agnostic, agent-based operationalization of the DISARM framework to investigate Foreign Information Man...

arXiv - AI · 4 min ·
[2511.07270] High-Dimensional Asymptotics of Differentially Private PCA
Data Science

[2511.07270] High-Dimensional Asymptotics of Differentially Private PCA

This paper investigates the high-dimensional asymptotics of differentially private PCA, focusing on optimal noise levels for privacy guar...

arXiv - Machine Learning · 4 min ·
[2601.14172] Human Values in a Single Sentence: Moral Presence, Hierarchies, and Transformer Ensembles on the Schwartz Continuum
Machine Learning

[2601.14172] Human Values in a Single Sentence: Moral Presence, Hierarchies, and Transformer Ensembles on the Schwartz Continuum

This article explores the detection of 19 human values in sentences using transformer models, demonstrating the learnability of moral pre...

arXiv - AI · 4 min ·
[2510.26046] Bias-Corrected Data Synthesis for Imbalanced Learning
Machine Learning

[2510.26046] Bias-Corrected Data Synthesis for Imbalanced Learning

This paper presents a method for bias-corrected data synthesis aimed at improving classification accuracy in imbalanced learning scenario...

arXiv - Machine Learning · 4 min ·
[2601.02085] Vision-Based Early Fault Diagnosis and Self-Recovery for Strawberry Harvesting Robots
Robotics

[2601.02085] Vision-Based Early Fault Diagnosis and Self-Recovery for Strawberry Harvesting Robots

This article presents a framework for early fault diagnosis and self-recovery in strawberry harvesting robots, leveraging vision-based te...

arXiv - AI · 4 min ·
[2512.14166] IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol
Llms

[2512.14166] IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol

The paper introduces IntentMiner, a novel approach to detect Intent Inversion Attacks in Large Language Models (LLMs) by analyzing tool c...

arXiv - AI · 4 min ·
[2512.12206] ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB
Machine Learning

[2512.12206] ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB

The paper presents the ALERT dataset and an input-size-agnostic Vision Transformer (ISA-ViT) for driver activity recognition using IR-UWB...

arXiv - Machine Learning · 4 min ·
[2511.07293] Formal Reasoning About Confidence and Automated Verification of Neural Networks
Machine Learning

[2511.07293] Formal Reasoning About Confidence and Automated Verification of Neural Networks

This paper presents a framework for formal reasoning about the confidence and robustness of neural networks, proposing a unified techniqu...

arXiv - AI · 3 min ·
[2511.01144] AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
Llms

[2511.01144] AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

The paper presents AthenaBench, a dynamic benchmark designed to evaluate large language models (LLMs) in the context of Cyber Threat Inte...

arXiv - AI · 4 min ·
[2508.17622] The Statistical Fairness-Accuracy Frontier
Machine Learning

[2508.17622] The Statistical Fairness-Accuracy Frontier

This article explores the trade-offs between fairness and accuracy in predictive modeling, introducing the fairness-accuracy (FA) Pareto ...

arXiv - Machine Learning · 3 min ·
[2507.07139] Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
Machine Learning

[2507.07139] Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning

The paper presents Recall, a novel adversarial framework that targets the robustness of image generation model unlearning, revealing vuln...

arXiv - Machine Learning · 4 min ·
[2506.05402] Lorica: A Synergistic Fine-Tuning Framework for Advancing Personalized Adversarial Robustness
Machine Learning

[2506.05402] Lorica: A Synergistic Fine-Tuning Framework for Advancing Personalized Adversarial Robustness

The paper presents Lorica, a novel framework aimed at enhancing personalized adversarial robustness in machine learning models, particula...

arXiv - Machine Learning · 4 min ·
[2510.04398] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
Llms

[2510.04398] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

The paper presents SECA, a method for eliciting hallucinations in large language models (LLMs) through semantically equivalent and cohere...

arXiv - Machine Learning · 4 min ·
[2510.02356] Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark
Llms

[2510.02356] Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

This article presents EAPrivacy, a benchmark for evaluating the physical-world privacy awareness of large language models (LLMs), reveali...

arXiv - AI · 4 min ·
[2510.00232] BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
Llms

[2510.00232] BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

The paper introduces BiasFreeBench, a benchmark designed to evaluate bias mitigation techniques in large language models (LLMs) by provid...

arXiv - Machine Learning · 4 min ·
[2505.12185] EVALOOOP: A Self-Consistency-Centered Framework for Assessing Large Language Model Robustness in Programming
Llms

[2505.12185] EVALOOOP: A Self-Consistency-Centered Framework for Assessing Large Language Model Robustness in Programming

The paper introduces EVALOOOP, a framework for assessing the robustness of large language models (LLMs) in programming tasks through self...

arXiv - Machine Learning · 4 min ·
Previous Page 101 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime