[2602.21479] Global Sequential Testing for Multi-Stream Auditing
Summary
The paper presents a novel approach to global sequential testing for auditing machine learning systems across multiple data streams, enhancing detection of unusual behaviors.
Why It Matters
This research addresses the critical need for effective auditing in machine learning, especially in risk-sensitive applications. By improving the efficiency of sequential testing methods, it enhances the reliability of machine learning systems, which is vital for industries relying on accurate data analysis.
Key Takeaways
- Introduces new sequential tests for multi-stream data auditing.
- Implements merging test martingales for improved stopping times.
- Demonstrates effectiveness on both synthetic and real-world datasets.
- Achieves better performance under dense alternative hypotheses.
- Addresses the limitations of standard Bonferroni correction methods.
Statistics > Machine Learning arXiv:2602.21479 (stat) [Submitted on 25 Feb 2026] Title:Global Sequential Testing for Multi-Stream Auditing Authors:Beepul Bharti, Ambar Pal, Jeremias Sulam View a PDF of the paper titled Global Sequential Testing for Multi-Stream Auditing, by Beepul Bharti and 2 other authors View PDF HTML (experimental) Abstract:Across many risk-sensitive areas, it is critical to continuously audit the performance of machine learning systems and detect any unusual behavior quickly. This can be modeled as a sequential hypothesis testing problem with $k$ incoming streams of data and a global null hypothesis that asserts that the system is working as expected across all $k$ streams. The standard global test employs a Bonferroni correction and has an expected stopping time bound of $O\left(\ln\frac{k}{\alpha}\right)$ when $k$ is large and the significance level of the test, $\alpha$, is small. In this work, we construct new sequential tests by using ideas of merging test martingales with different trade-offs in expected stopping times under different, sparse or dense alternative hypotheses. We further derive a new, balanced test that achieves an improved expected stopping time bound that matches Bonferroni's in the sparse setting but that naturally results in $O\left(\frac{1}{k}\ln\frac{1}{\alpha}\right)$ under a dense alternative. We empirically demonstrate the effectiveness of our proposed tests on synthetic and real-world data. Subjects: Machine Learning (st...