Machine Learning Ai Safety Data Science

[2510.00463] On the Adversarial Robustness of Learning-based Conformal Novelty Detection

arXiv - Machine Learning February 23, 2026 4 min read Article

Summary

This paper investigates the adversarial robustness of learning-based conformal novelty detection methods, revealing significant vulnerabilities that can increase false discovery rates under adversarial conditions.

Why It Matters

As machine learning models are increasingly deployed in real-world applications, understanding their vulnerabilities to adversarial attacks is crucial. This research highlights the limitations of current novelty detection methods, emphasizing the need for more robust alternatives to ensure reliability in critical applications.

Key Takeaways

Adversarial perturbations can significantly increase false discovery rates in novelty detection methods.
The study formulates an oracle attack setup to quantify the degradation of performance under adversarial conditions.
Two learning-based frameworks were evaluated, exposing their vulnerabilities and motivating the need for improved robustness.
The research provides a systematic evaluation using both synthetic and real-world datasets.
Findings suggest that existing error-controlled novelty detection methods have fundamental limitations.

Statistics > Machine Learning arXiv:2510.00463 (stat) [Submitted on 1 Oct 2025 (v1), last revised 20 Feb 2026 (this version, v3)] Title:On the Adversarial Robustness of Learning-based Conformal Novelty Detection Authors:Daofu Zhang, Mehrdad Pournaderi, Hanne M. Clifford, Yu Xiang, Pramod K. Varshney View a PDF of the paper titled On the Adversarial Robustness of Learning-based Conformal Novelty Detection, by Daofu Zhang and 4 other authors View PDF HTML (experimental) Abstract:This paper studies the adversarial robustness of conformal novelty detection. In particular, we focus on two powerful learning-based frameworks that come with finite-sample false discovery rate (FDR) control: one is AdaDetect (by Marandon et al., 2024) that is based on the positive-unlabeled classifier, and the other is a one-class classifier-based approach (by Bates et al., 2023). While they provide rigorous statistical guarantees under benign conditions, their behavior under adversarial perturbations remains underexplored. We first formulate an oracle attack setup, under the AdaDetect formulation, that quantifies the worst-case degradation of FDR, deriving an upper bound that characterizes the statistical cost of attacks. This idealized formulation directly motivates a practical and effective attack scheme that only requires query access to the output labels of both frameworks. Coupling these formulations with two popular and complementary black-box adversarial algorithms, we systematically evaluat...

Read Original Article