[2603.14218] Interleaved Resampling and Refitting: Data and

[2603.14218] Interleaved Resampling and Refitting: Data and Compute-Efficient Evaluation of Black-Box Predictors

arXiv - Machine Learning April 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.14218: Interleaved Resampling and Refitting: Data and Compute-Efficient Evaluation of Black-Box Predictors

Computer Science > Machine Learning arXiv:2603.14218 (cs) [Submitted on 15 Mar 2026 (v1), last revised 2 Apr 2026 (this version, v2)] Title:Interleaved Resampling and Refitting: Data and Compute-Efficient Evaluation of Black-Box Predictors Authors:Haichen Hu, David Simchi-Levi View a PDF of the paper titled Interleaved Resampling and Refitting: Data and Compute-Efficient Evaluation of Black-Box Predictors, by Haichen Hu and 1 other authors View PDF HTML (experimental) Abstract:We study the problem of evaluating the excess risk of large-scale empirical risk minimization under the square loss. Leveraging the idea of wild refitting and resampling, we assume only black-box access to the training algorithm and develop an efficient procedure for estimating the excess risk. Our evaluation algorithm is both computationally and data efficient. In particular, it requires access to only a single dataset and does not rely on any additional validation data. Computationally, it only requires refitting the model on several much smaller datasets obtained through sequential resampling, in contrast to previous wild refitting methods that require full-scale retraining and might therefore be unsuitable for large-scale trained predictors. Our algorithm has an interleaved sequential resampling-and-refitting structure. We first construct pseudo-responses through a randomized residual symmetrization procedure. At each round, we thus resample two sub-datasets from the resulting covariate pseudo-re...

Originally published on April 03, 2026. Curated by AI News.

Machine Learning

How do you anonymize code for a conference submission? [D]

Hi everyone, I have a question about anonymizing code for conference submissions. I’m submitting an AI/ML paper to a conference and would...

Reddit - Machine Learning · 1 min · 30 minutes ago

Machine Learning

Now Meta will track what employees do on their computers to train its AI agents | The Verge

Meta is reportedly using tracking software to record its employees’ mouse and keyboard activity for training data for its AI agents.

The Verge - AI · 4 min · about 2 hours ago

Llms

Training-time intervention yields 63.4% blind-pair human preference at matched val-loss (1.2B params, 320 judgments, p = 1.98 × 10⁻⁵) [R]

TL;DR. I ran a blind A/B preference evaluation between two 1.2B-parameter LMs trained on identical data (same order, same seed, 30K steps...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

I can't believe text normalization is so underdiscussed in streaming text-to-speech [D]

Kinda suprises me how little discussion there is around about mistakes in streaming TTS models People look for natural readers, high voic...

Reddit - Machine Learning · 1 min · about 4 hours ago

[2603.14218] Interleaved Resampling and Refitting: Data and Compute-Efficient Evaluation of Black-Box Predictors

About this article

Related Articles

How do you anonymize code for a conference submission? [D]

Now Meta will track what employees do on their computers to train its AI agents | The Verge

Training-time intervention yields 63.4% blind-pair human preference at matched val-loss (1.2B params, 320 judgments, p = 1.98 × 10⁻⁵) [R]

I can't believe text normalization is so underdiscussed in streaming text-to-speech [D]

No comments

Stay updated with AI News