AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment
Machine Learning

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Abstract page for arXiv paper 2511.21331: The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

arXiv - AI · 4 min ·
[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?
Llms

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

Abstract page for arXiv paper 2509.22367: What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv - AI · 4 min ·
[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
Machine Learning

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Abstract page for arXiv paper 2507.22264: SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

arXiv - AI · 4 min ·

All Content

[2601.01016] Improving Variational Autoencoder using Random Fourier Transformation: An Aviation Safety Anomaly Detection Case-Study
Machine Learning

[2601.01016] Improving Variational Autoencoder using Random Fourier Transformation: An Aviation Safety Anomaly Detection Case-Study

This study explores enhancements to Variational Autoencoders (VAEs) using Random Fourier Transformation (RFT) for anomaly detection in av...

arXiv - Machine Learning · 4 min ·
[2511.09763] Is nasty noise actually harder than malicious noise?
Machine Learning

[2511.09763] Is nasty noise actually harder than malicious noise?

This paper explores the complexities of learning Boolean functions in the presence of two noise models: malicious and nasty noise, highli...

arXiv - Machine Learning · 4 min ·
[2510.03269] General Exploratory Bonus for Optimistic Exploration in RLHF
Machine Learning

[2510.03269] General Exploratory Bonus for Optimistic Exploration in RLHF

This paper introduces the General Exploratory Bonus (GEB) framework, which enhances optimistic exploration in reinforcement learning with...

arXiv - AI · 4 min ·
[2509.20936] GenFacts-Generative Counterfactual Explanations for Multi-Variate Time Series
Machine Learning

[2509.20936] GenFacts-Generative Counterfactual Explanations for Multi-Variate Time Series

The paper introduces GenFacts, a generative framework for creating counterfactual explanations in multivariate time series, improving mod...

arXiv - Machine Learning · 3 min ·
[2509.18131] Randomness and signal propagation in physics-informed neural networks (PINNs): A neural PDE perspective
Machine Learning

[2509.18131] Randomness and signal propagation in physics-informed neural networks (PINNs): A neural PDE perspective

This article investigates the randomness in weight matrices of physics-informed neural networks (PINNs) and its impact on signal propagat...

arXiv - Machine Learning · 4 min ·
[2508.16832] Out of Distribution Detection for Efficient Continual Learning in Quality Prediction for Arc Welding
Machine Learning

[2508.16832] Out of Distribution Detection for Efficient Continual Learning in Quality Prediction for Arc Welding

This article presents a novel approach to out-of-distribution detection in arc welding quality prediction, enhancing continual learning b...

arXiv - AI · 4 min ·
[2508.16237] A XAI-based Framework for Frequency Subband Characterization of Cough Spectrograms in Chronic Respiratory Disease
Machine Learning

[2508.16237] A XAI-based Framework for Frequency Subband Characterization of Cough Spectrograms in Chronic Respiratory Disease

This paper presents an explainable AI framework for analyzing cough sounds linked to chronic respiratory diseases, focusing on COPD. It u...

arXiv - AI · 4 min ·
[2508.06601] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
Llms

[2508.06601] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

This paper explores how filtering pretraining data can enhance the tamper-resistance of open-weight large language models (LLMs), present...

arXiv - AI · 4 min ·
[2508.11460] Calibrated and uncertain? Evaluating uncertainty estimates in binary classification models
Machine Learning

[2508.11460] Calibrated and uncertain? Evaluating uncertainty estimates in binary classification models

This article evaluates uncertainty estimates in binary classification models, comparing six probabilistic machine learning algorithms to ...

arXiv - Machine Learning · 4 min ·
[2507.01761] Enhanced Generative Model Evaluation with Clipped Density and Coverage
Machine Learning

[2507.01761] Enhanced Generative Model Evaluation with Clipped Density and Coverage

This article presents novel metrics, Clipped Density and Clipped Coverage, aimed at improving the evaluation of generative models by enha...

arXiv - AI · 4 min ·
[2505.11824] Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
Llms

[2505.11824] Latent Veracity Inference for Identifying Errors in Stepwise Reasoning

This paper presents a novel method for identifying errors in stepwise reasoning using latent veracity inference, enhancing the reliabilit...

arXiv - AI · 4 min ·
[2602.15278] Visual Persuasion: What Influences Decisions of Vision-Language Models?
Llms

[2602.15278] Visual Persuasion: What Influences Decisions of Vision-Language Models?

This article explores how visual-language models (VLMs) make decisions based on image inputs, introducing a framework to analyze their pr...

arXiv - AI · 4 min ·
[2504.15206] How Global Calibration Strengthens Multiaccuracy
Machine Learning

[2504.15206] How Global Calibration Strengthens Multiaccuracy

This article explores how global calibration enhances multiaccuracy in machine learning, revealing its potential to improve predictive fa...

arXiv - Machine Learning · 4 min ·
[2602.15265] From Diagnosis to Inoculation: Building Cognitive Resistance to AI Disempowerment
Ai Safety

[2602.15265] From Diagnosis to Inoculation: Building Cognitive Resistance to AI Disempowerment

This article discusses the need for cognitive resistance to AI disempowerment, proposing an AI literacy framework based on pedagogical in...

arXiv - AI · 4 min ·
[2502.13022] Efficient and Sharp Off-Policy Learning under Unobserved Confounding
Ai Safety

[2502.13022] Efficient and Sharp Off-Policy Learning under Unobserved Confounding

This paper presents a novel method for off-policy learning that addresses unobserved confounding, enhancing the accuracy of policy learni...

arXiv - Machine Learning · 4 min ·
[2502.03576] Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation
Robotics

[2502.03576] Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation

This article presents a theoretical framework for clone-robust weighting functions in metric spaces, addressing redundancy bias in benchm...

arXiv - Machine Learning · 4 min ·
[2501.10466] Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction
Machine Learning

[2501.10466] Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction

This paper presents a novel approach to enhance semi-supervised adversarial training (SSAT) by employing latent clustering-based data red...

arXiv - AI · 4 min ·
[2412.20987] RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses
Machine Learning

[2412.20987] RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

The paper 'RobustBlack' explores the effectiveness of black-box adversarial attacks against state-of-the-art defenses, revealing signific...

arXiv - Machine Learning · 3 min ·
[2602.15198] Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems
Llms

[2602.15198] Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

The paper introduces Colosseum, a framework designed to audit collusion in cooperative multi-agent systems, highlighting the risks of age...

arXiv - AI · 3 min ·
[2406.03862] Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
Robotics

[2406.03862] Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation

This paper explores behavior-targeted attacks on reinforcement learning systems and proposes a novel defense strategy using time-discount...

arXiv - AI · 3 min ·
Previous Page 93 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime