AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

China drafts law regulating 'digital humans' and banning addictive virtual services for children

A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...

Reddit - Artificial Intelligence · 1 min · about 8 hours ago

Generative Ai

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Abstract page for arXiv paper 2512.00408: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

arXiv - AI · 3 min · about 9 hours ago

Llms

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

arXiv - AI · 4 min · about 9 hours ago

All Content

Ai Safety

[2602.01696] Cross-Modal Purification and Fusion for Small-Object RGB-D Transmission-Line Defect Detection

This paper presents CMAFNet, a novel network for detecting small defects in transmission lines using RGB-D data, achieving significant pe...

arXiv - AI · 4 min · about 2 months ago

Nlp

[2602.01023] Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment

This paper presents a unified framework for Query Auto-Completion (QAC) that integrates Retrieval-Augmented Generation (RAG) and multi-ob...

arXiv - AI · 4 min · about 2 months ago

Llms

[2601.20538] Interpreting Emergent Extreme Events in Multi-Agent Systems

This paper presents a framework for interpreting emergent extreme events in multi-agent systems, focusing on the origins and drivers of t...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2511.15315] Robust Bayesian Optimisation with Unbounded Corruptions

The paper introduces RCGP-UCB, a robust Bayesian optimization algorithm designed to handle extreme outliers by allowing unbounded corrupt...

arXiv - Machine Learning · 3 min · about 2 months ago

Robotics

[2601.15109] An Agentic Operationalization of DISARM for FIMI Investigation on Social Media

This article presents a framework-agnostic, agent-based operationalization of the DISARM framework to investigate Foreign Information Man...

arXiv - AI · 4 min · about 2 months ago

Data Science

[2511.07270] High-Dimensional Asymptotics of Differentially Private PCA

This paper investigates the high-dimensional asymptotics of differentially private PCA, focusing on optimal noise levels for privacy guar...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2601.14172] Human Values in a Single Sentence: Moral Presence, Hierarchies, and Transformer Ensembles on the Schwartz Continuum

This article explores the detection of 19 human values in sentences using transformer models, demonstrating the learnability of moral pre...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2510.26046] Bias-Corrected Data Synthesis for Imbalanced Learning

This paper presents a method for bias-corrected data synthesis aimed at improving classification accuracy in imbalanced learning scenario...

arXiv - Machine Learning · 4 min · about 2 months ago

Robotics

[2601.02085] Vision-Based Early Fault Diagnosis and Self-Recovery for Strawberry Harvesting Robots

This article presents a framework for early fault diagnosis and self-recovery in strawberry harvesting robots, leveraging vision-based te...

arXiv - AI · 4 min · about 2 months ago

Llms

[2512.14166] IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol

The paper introduces IntentMiner, a novel approach to detect Intent Inversion Attacks in Large Language Models (LLMs) by analyzing tool c...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2512.12206] ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB

The paper presents the ALERT dataset and an input-size-agnostic Vision Transformer (ISA-ViT) for driver activity recognition using IR-UWB...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2511.07293] Formal Reasoning About Confidence and Automated Verification of Neural Networks

This paper presents a framework for formal reasoning about the confidence and robustness of neural networks, proposing a unified techniqu...

arXiv - AI · 3 min · about 2 months ago

Llms

[2511.01144] AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

The paper presents AthenaBench, a dynamic benchmark designed to evaluate large language models (LLMs) in the context of Cyber Threat Inte...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2508.17622] The Statistical Fairness-Accuracy Frontier

This article explores the trade-offs between fairness and accuracy in predictive modeling, introducing the fairness-accuracy (FA) Pareto ...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2507.07139] Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning

The paper presents Recall, a novel adversarial framework that targets the robustness of image generation model unlearning, revealing vuln...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2506.05402] Lorica: A Synergistic Fine-Tuning Framework for Advancing Personalized Adversarial Robustness

The paper presents Lorica, a novel framework aimed at enhancing personalized adversarial robustness in machine learning models, particula...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2510.04398] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

The paper presents SECA, a method for eliciting hallucinations in large language models (LLMs) through semantically equivalent and cohere...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2510.02356] Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

This article presents EAPrivacy, a benchmark for evaluating the physical-world privacy awareness of large language models (LLMs), reveali...

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.00232] BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

The paper introduces BiasFreeBench, a benchmark designed to evaluate bias mitigation techniques in large language models (LLMs) by provid...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2505.12185] EVALOOOP: A Self-Consistency-Centered Framework for Assessing Large Language Model Robustness in Programming

The paper introduces EVALOOOP, a framework for assessing the robustness of large language models (LLMs) in programming tasks through self...

arXiv - Machine Learning · 4 min · about 2 months ago

Previous Page 101 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

China drafts law regulating 'digital humans' and banning addictive virtual services for children

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

All Content

[2602.01696] Cross-Modal Purification and Fusion for Small-Object RGB-D Transmission-Line Defect Detection

[2602.01023] Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment

[2601.20538] Interpreting Emergent Extreme Events in Multi-Agent Systems

[2511.15315] Robust Bayesian Optimisation with Unbounded Corruptions

[2601.15109] An Agentic Operationalization of DISARM for FIMI Investigation on Social Media

[2511.07270] High-Dimensional Asymptotics of Differentially Private PCA

[2601.14172] Human Values in a Single Sentence: Moral Presence, Hierarchies, and Transformer Ensembles on the Schwartz Continuum

[2510.26046] Bias-Corrected Data Synthesis for Imbalanced Learning

[2601.02085] Vision-Based Early Fault Diagnosis and Self-Recovery for Strawberry Harvesting Robots

[2512.14166] IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol

[2512.12206] ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB

[2511.07293] Formal Reasoning About Confidence and Automated Verification of Neural Networks

[2511.01144] AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

[2508.17622] The Statistical Fairness-Accuracy Frontier

[2507.07139] Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning

[2506.05402] Lorica: A Synergistic Fine-Tuning Framework for Advancing Personalized Adversarial Robustness

[2510.04398] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

[2510.02356] Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

[2510.00232] BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

[2505.12185] EVALOOOP: A Self-Consistency-Centered Framework for Assessing Large Language Model Robustness in Programming

Related Topics

Stay updated with AI News