AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Abstract page for arXiv paper 2511.21331: The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

arXiv - AI · 4 min · about 20 hours ago

Llms

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

Abstract page for arXiv paper 2509.22367: What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv - AI · 4 min · about 20 hours ago

Machine Learning

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Abstract page for arXiv paper 2507.22264: SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

arXiv - AI · 4 min · about 20 hours ago

All Content

Machine Learning

[2601.01016] Improving Variational Autoencoder using Random Fourier Transformation: An Aviation Safety Anomaly Detection Case-Study

This study explores enhancements to Variational Autoencoders (VAEs) using Random Fourier Transformation (RFT) for anomaly detection in av...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2511.09763] Is nasty noise actually harder than malicious noise?

This paper explores the complexities of learning Boolean functions in the presence of two noise models: malicious and nasty noise, highli...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2510.03269] General Exploratory Bonus for Optimistic Exploration in RLHF

This paper introduces the General Exploratory Bonus (GEB) framework, which enhances optimistic exploration in reinforcement learning with...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2509.20936] GenFacts-Generative Counterfactual Explanations for Multi-Variate Time Series

The paper introduces GenFacts, a generative framework for creating counterfactual explanations in multivariate time series, improving mod...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2509.18131] Randomness and signal propagation in physics-informed neural networks (PINNs): A neural PDE perspective

This article investigates the randomness in weight matrices of physics-informed neural networks (PINNs) and its impact on signal propagat...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2508.16832] Out of Distribution Detection for Efficient Continual Learning in Quality Prediction for Arc Welding

This article presents a novel approach to out-of-distribution detection in arc welding quality prediction, enhancing continual learning b...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2508.16237] A XAI-based Framework for Frequency Subband Characterization of Cough Spectrograms in Chronic Respiratory Disease

This paper presents an explainable AI framework for analyzing cough sounds linked to chronic respiratory diseases, focusing on COPD. It u...

arXiv - AI · 4 min · about 2 months ago

Llms

[2508.06601] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

This paper explores how filtering pretraining data can enhance the tamper-resistance of open-weight large language models (LLMs), present...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2508.11460] Calibrated and uncertain? Evaluating uncertainty estimates in binary classification models

This article evaluates uncertainty estimates in binary classification models, comparing six probabilistic machine learning algorithms to ...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2507.01761] Enhanced Generative Model Evaluation with Clipped Density and Coverage

This article presents novel metrics, Clipped Density and Clipped Coverage, aimed at improving the evaluation of generative models by enha...

arXiv - AI · 4 min · about 2 months ago

Llms

[2505.11824] Latent Veracity Inference for Identifying Errors in Stepwise Reasoning

This paper presents a novel method for identifying errors in stepwise reasoning using latent veracity inference, enhancing the reliabilit...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.15278] Visual Persuasion: What Influences Decisions of Vision-Language Models?

This article explores how visual-language models (VLMs) make decisions based on image inputs, introducing a framework to analyze their pr...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2504.15206] How Global Calibration Strengthens Multiaccuracy

This article explores how global calibration enhances multiaccuracy in machine learning, revealing its potential to improve predictive fa...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Safety

[2602.15265] From Diagnosis to Inoculation: Building Cognitive Resistance to AI Disempowerment

This article discusses the need for cognitive resistance to AI disempowerment, proposing an AI literacy framework based on pedagogical in...

arXiv - AI · 4 min · about 2 months ago

Ai Safety

[2502.13022] Efficient and Sharp Off-Policy Learning under Unobserved Confounding

This paper presents a novel method for off-policy learning that addresses unobserved confounding, enhancing the accuracy of policy learni...

arXiv - Machine Learning · 4 min · about 2 months ago

Robotics

[2502.03576] Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation

This article presents a theoretical framework for clone-robust weighting functions in metric spaces, addressing redundancy bias in benchm...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2501.10466] Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction

This paper presents a novel approach to enhance semi-supervised adversarial training (SSAT) by employing latent clustering-based data red...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2412.20987] RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

The paper 'RobustBlack' explores the effectiveness of black-box adversarial attacks against state-of-the-art defenses, revealing signific...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.15198] Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

The paper introduces Colosseum, a framework designed to audit collusion in cooperative multi-agent systems, highlighting the risks of age...

arXiv - AI · 3 min · about 2 months ago

Robotics

[2406.03862] Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation

This paper explores behavior-targeted attacks on reinforcement learning systems and proposes a novel defense strategy using time-discount...

arXiv - AI · 3 min · about 2 months ago

Previous Page 93 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

All Content

[2601.01016] Improving Variational Autoencoder using Random Fourier Transformation: An Aviation Safety Anomaly Detection Case-Study

[2511.09763] Is nasty noise actually harder than malicious noise?

[2510.03269] General Exploratory Bonus for Optimistic Exploration in RLHF

[2509.20936] GenFacts-Generative Counterfactual Explanations for Multi-Variate Time Series

[2509.18131] Randomness and signal propagation in physics-informed neural networks (PINNs): A neural PDE perspective

[2508.16832] Out of Distribution Detection for Efficient Continual Learning in Quality Prediction for Arc Welding

[2508.16237] A XAI-based Framework for Frequency Subband Characterization of Cough Spectrograms in Chronic Respiratory Disease

[2508.06601] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

[2508.11460] Calibrated and uncertain? Evaluating uncertainty estimates in binary classification models

[2507.01761] Enhanced Generative Model Evaluation with Clipped Density and Coverage

[2505.11824] Latent Veracity Inference for Identifying Errors in Stepwise Reasoning

[2602.15278] Visual Persuasion: What Influences Decisions of Vision-Language Models?

[2504.15206] How Global Calibration Strengthens Multiaccuracy

[2602.15265] From Diagnosis to Inoculation: Building Cognitive Resistance to AI Disempowerment

[2502.13022] Efficient and Sharp Off-Policy Learning under Unobserved Confounding

[2502.03576] Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation

[2501.10466] Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction

[2412.20987] RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

[2602.15198] Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

[2406.03862] Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation

Related Topics

Stay updated with AI News