AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

China drafts law regulating 'digital humans' and banning addictive virtual services for children

A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Generative Ai

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Abstract page for arXiv paper 2512.00408: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

arXiv - AI · 3 min · about 4 hours ago

Llms

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

arXiv - AI · 4 min · about 4 hours ago

All Content

Machine Learning

[2602.15368] GMAIL: Generative Modality Alignment for generated Image Learning

The paper presents GMAIL, a novel framework for aligning generated images with real images in machine learning, enhancing performance in ...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Safety

[2602.15326] SCENE OTA-FD: Self-Centering Noncoherent Estimator for Over-the-Air Federated Distillation

The paper presents SCENE, a novel estimator for over-the-air federated distillation that enhances aggregation without requiring pilot sig...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.15645] CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving

The article presents CARE Drive, a framework for evaluating the reason-responsiveness of vision language models in automated driving, add...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.15323] Unforgeable Watermarks for Language Models via Robust Signatures

The paper presents a novel watermarking scheme for language models that ensures unforgeability and recoverability, enhancing content prov...

arXiv - Machine Learning · 4 min · about 2 months ago

Nlp

[2602.15553] RUVA: Personalized Transparent On-Device Graph Reasoning

The paper presents RUVA, a novel architecture for personalized on-device graph reasoning that enhances user control over AI-generated con...

arXiv - AI · 3 min · about 2 months ago

Generative Ai

[2602.15259] Knowing Isn't Understanding: Re-grounding Generative Proactivity with Epistemic and Behavioral Insight

This paper discusses the limitations of generative AI agents that equate understanding with resolving explicit queries, highlighting the ...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.15532] Quantifying construct validity in large language model evaluations

This paper presents a structured capabilities model to improve the construct validity of large language model (LLM) evaluations, addressi...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.15252] Decision Making under Imperfect Recall: Algorithms and Benchmarks

This paper presents a benchmark suite for decision-making under imperfect recall in game theory, introducing regret matching algorithms t...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.15195] Weight space Detection of Backdoors in LoRA Adapters

This article presents a novel method for detecting backdoors in LoRA adapters by analyzing their weight matrices, achieving high accuracy...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.15391] Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

The paper presents a novel adaptive abstention system for Large Language Models (LLMs) that balances safety and utility by dynamically ad...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.15384] World-Model-Augmented Web Agents with Action Correction

The paper presents WAC, a web agent that enhances task execution by integrating model collaboration, consequence simulation, and action r...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.15161] Exploiting Layer-Specific Vulnerabilities to Backdoor Attack in Federated Learning

This paper presents the Layer Smoothing Attack (LSA), a novel backdoor attack exploiting layer-specific vulnerabilities in federated lear...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.15298] X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection

The paper presents X-MAP, a framework for analyzing and profiling misclassifications in spam and phishing detection, enhancing interpreta...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.15274] When Remembering and Planning are Worth it: Navigating under Change

This article explores how various memory types enhance spatial navigation in changing environments, highlighting the efficiency of agents...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.15173] Mind the (DH) Gap! A Contrast in Risky Choices Between Reasoning and Conversational LLMs

This paper examines the decision-making behaviors of large language models (LLMs) under uncertainty, contrasting reasoning models with co...

arXiv - AI · 3 min · about 2 months ago

Ai Agents

[2602.15212] Secure and Energy-Efficient Wireless Agentic AI Networks

This paper presents a secure and energy-efficient wireless AI network that utilizes a supervisor AI agent to optimize reasoning tasks whi...

arXiv - AI · 4 min · about 2 months ago

Ai Agents

[2602.15158] da Costa and Tarski meet Goguen and Carnap: a novel approach for ontological heterogeneity based on consequence systems

This paper introduces a novel approach to ontological heterogeneity, integrating concepts from Carnapian-Goguenism and consequence system...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.15143] Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

This paper explores methods to protect language models from unauthorized knowledge distillation by modifying reasoning traces, focusing o...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.15829] Operationalising the Superficial Alignment Hypothesis via Task Complexity

This article presents a new metric called task complexity to operationalize the Superficial Alignment Hypothesis, demonstrating how pre-t...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.15799] The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety

This paper explores how fine-tuning language models can inadvertently degrade safety measures, revealing structural vulnerabilities in al...

arXiv - AI · 4 min · about 2 months ago

Previous Page 98 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

China drafts law regulating 'digital humans' and banning addictive virtual services for children

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

All Content

[2602.15368] GMAIL: Generative Modality Alignment for generated Image Learning

[2602.15326] SCENE OTA-FD: Self-Centering Noncoherent Estimator for Over-the-Air Federated Distillation

[2602.15645] CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving

[2602.15323] Unforgeable Watermarks for Language Models via Robust Signatures

[2602.15553] RUVA: Personalized Transparent On-Device Graph Reasoning

[2602.15259] Knowing Isn't Understanding: Re-grounding Generative Proactivity with Epistemic and Behavioral Insight

[2602.15532] Quantifying construct validity in large language model evaluations

[2602.15252] Decision Making under Imperfect Recall: Algorithms and Benchmarks

[2602.15195] Weight space Detection of Backdoors in LoRA Adapters

[2602.15391] Improving LLM Reliability through Hybrid Abstention and Adaptive Detection

[2602.15384] World-Model-Augmented Web Agents with Action Correction

[2602.15161] Exploiting Layer-Specific Vulnerabilities to Backdoor Attack in Federated Learning

[2602.15298] X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection

[2602.15274] When Remembering and Planning are Worth it: Navigating under Change

[2602.15173] Mind the (DH) Gap! A Contrast in Risky Choices Between Reasoning and Conversational LLMs

[2602.15212] Secure and Energy-Efficient Wireless Agentic AI Networks

[2602.15158] da Costa and Tarski meet Goguen and Carnap: a novel approach for ontological heterogeneity based on consequence systems

[2602.15143] Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

[2602.15829] Operationalising the Superficial Alignment Hypothesis via Task Complexity

[2602.15799] The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety

Related Topics

Stay updated with AI News