AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

China drafts law regulating 'digital humans' and banning addictive virtual services for children

A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...

Reddit - Artificial Intelligence · 1 min · about 10 hours ago

Generative Ai

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Abstract page for arXiv paper 2512.00408: Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

arXiv - AI · 3 min · about 11 hours ago

Llms

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

arXiv - AI · 4 min · about 11 hours ago

All Content

Llms

[2509.23519] ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search

The paper introduces ReliabilityRAG, a framework designed to enhance the robustness of Retrieval-Augmented Generation (RAG) systems again...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2509.22794] Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression

This paper presents a novel algorithm for instrumental variable regression that ensures differential privacy while maintaining statistica...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2509.18776] AECBench: A Hierarchical Benchmark for Knowledge Evaluation of Large Language Models in the AEC Field

The paper introduces AECBench, a benchmark for evaluating large language models (LLMs) in the Architecture, Engineering, and Construction...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Startups

[2504.09733] Epsilon-Neighborhood Decision-Boundary Governed Estimation (EDGE) of 2D Black Box Classifier Functions

The paper presents the Epsilon-Neighborhood Decision-Boundary Governed Estimation (EDGE) algorithm for efficiently estimating decision bo...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2503.00187] Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks

This paper presents a safety steering framework to enhance the robustness of large language models (LLMs) against multi-turn jailbreaking...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Safety

[2502.01713] Auditing a Dutch Public Sector Risk Profiling Algorithm Using an Unsupervised Bias Detection Tool

This article presents an audit of a Dutch public sector risk profiling algorithm, utilizing an unsupervised bias detection tool to identi...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2509.05311] Large Language Model Integration with Reinforcement Learning to Augment Decision-Making in Autonomous Cyber Operations

This article explores the integration of Large Language Models (LLMs) with Reinforcement Learning (RL) to enhance decision-making in auto...

arXiv - Machine Learning · 3 min · about 2 months ago

Robotics

[2411.12159] Sensor-fusion based Prognostics for Deep-space Habitats Exhibiting Multiple Unlabeled Failure Modes

This paper presents a novel unsupervised prognostics framework for deep-space habitats, addressing multiple unlabeled failure modes throu...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2508.19278] Towards Production-Worthy Simulation for Autonomous Cyber Operations

This article presents a framework for enhancing simulation environments in Autonomous Cyber Operations (ACO) by implementing new actions ...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2409.11972] Generation of Uncertainty-Aware High-Level Spatial Concepts in Factorized 3D Scene Graphs via Graph Neural Networks

This paper introduces a learning-based approach for generating uncertainty-aware high-level spatial concepts in 3D Scene Graphs, enhancin...

arXiv - Machine Learning · 4 min · about 2 months ago

Nlp

[2508.03882] Simulating Cyberattacks through a Breach Attack Simulation (BAS) Platform empowered by Security Chaos Engineering (SCE)

This article presents a novel approach to simulating cyberattacks by integrating Security Chaos Engineering (SCE) into Breach Attack Simu...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.11712] Potential-energy gating for robust state estimation in bistable stochastic systems

This article presents a novel method called potential-energy gating for robust state estimation in bistable stochastic systems, enhancing...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.11079] In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution

The paper introduces activation-based data attribution to identify and mitigate undesirable behaviors in production language models post-...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.09238] Feature salience -- not task-informativeness -- drives machine learning model explanations

This paper investigates the factors influencing feature importance in machine learning model explanations, emphasizing that feature salie...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2506.11526] Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis

This survey explores the role of foundation models in enhancing scenario generation and analysis for autonomous driving, addressing limit...

arXiv - AI · 4 min · about 2 months ago

Robotics

[2602.08655] From Robotics to Sepsis Treatment: Offline RL via Geometric Pessimism

This article presents a novel framework called Geometric Pessimism for Offline Reinforcement Learning (RL), enhancing performance in robo...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.06801] On the Non-Identifiability of Steering Vectors in Large Language Models

This paper explores the non-identifiability of steering vectors in large language models (LLMs), revealing that these vectors cannot be u...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.06130] Self-Improving World Modelling with Latent Actions

The paper presents SWIRL, a framework for self-improving world modeling in machine learning, focusing on latent actions to enhance predic...

arXiv - AI · 4 min · about 2 months ago

Llms

[2506.04051] High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

The paper presents HALT, a method for finetuning large language models (LLMs) to enhance reliability by generating responses only when co...

arXiv - AI · 4 min · about 2 months ago

Ai Safety

[2602.05119] Unbiased Single-Queried Gradient for Combinatorial Objective

This paper presents a novel stochastic gradient method for combinatorial optimization that requires only a single query, enhancing effici...

arXiv - Machine Learning · 3 min · about 2 months ago

Previous Page 102 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

China drafts law regulating 'digital humans' and banning addictive virtual services for children

[2512.00408] Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

All Content

[2509.23519] ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search

[2509.22794] Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression

[2509.18776] AECBench: A Hierarchical Benchmark for Knowledge Evaluation of Large Language Models in the AEC Field

[2504.09733] Epsilon-Neighborhood Decision-Boundary Governed Estimation (EDGE) of 2D Black Box Classifier Functions

[2503.00187] Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks

[2502.01713] Auditing a Dutch Public Sector Risk Profiling Algorithm Using an Unsupervised Bias Detection Tool

[2509.05311] Large Language Model Integration with Reinforcement Learning to Augment Decision-Making in Autonomous Cyber Operations

[2411.12159] Sensor-fusion based Prognostics for Deep-space Habitats Exhibiting Multiple Unlabeled Failure Modes

[2508.19278] Towards Production-Worthy Simulation for Autonomous Cyber Operations

[2409.11972] Generation of Uncertainty-Aware High-Level Spatial Concepts in Factorized 3D Scene Graphs via Graph Neural Networks

[2508.03882] Simulating Cyberattacks through a Breach Attack Simulation (BAS) Platform empowered by Security Chaos Engineering (SCE)

[2602.11712] Potential-energy gating for robust state estimation in bistable stochastic systems

[2602.11079] In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution

[2602.09238] Feature salience -- not task-informativeness -- drives machine learning model explanations

[2506.11526] Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis

[2602.08655] From Robotics to Sepsis Treatment: Offline RL via Geometric Pessimism

[2602.06801] On the Non-Identifiability of Steering Vectors in Large Language Models

[2602.06130] Self-Improving World Modelling with Latent Actions

[2506.04051] High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

[2602.05119] Unbiased Single-Queried Gradient for Combinatorial Objective

Related Topics

Stay updated with AI News