[2602.22107] Don't stop me now: Rethinking Validation Criteria for Model Parameter Selection
Summary
This paper examines how different validation criteria for model parameter selection impact test performance in neural classifiers, revealing that loss-based criteria outperform accuracy-based methods.
Why It Matters
Understanding the effectiveness of validation criteria is crucial for improving model performance in machine learning. This study challenges traditional reliance on validation accuracy, suggesting that loss-based metrics may yield better outcomes, which can significantly influence model training practices in the field.
Key Takeaways
- Early stopping based on validation accuracy often results in lower test accuracy.
- Loss-based validation criteria provide more stable and comparable test accuracy.
- Single validation rules frequently underperform compared to the best performance across all epochs.
Computer Science > Machine Learning arXiv:2602.22107 (cs) [Submitted on 25 Feb 2026] Title:Don't stop me now: Rethinking Validation Criteria for Model Parameter Selection Authors:Andrea Apicella, Francesco Isgrò, Andrea Pollastro, Roberto Prevete View a PDF of the paper titled Don't stop me now: Rethinking Validation Criteria for Model Parameter Selection, by Andrea Apicella and 3 other authors View PDF Abstract:Despite the extensive literature on training loss functions, the evaluation of generalization on the validation set remains underexplored. In this work, we conduct a systematic empirical and statistical study of how the validation criterion used for model selection affects test performance in neural classifiers, with attention to early stopping. Using fully connected networks on standard benchmarks under $k$-fold evaluation, we compare: (i) early stopping with patience and (ii) post-hoc selection over all epochs (i.e. no early stopping). Models are trained with cross-entropy, C-Loss, or PolyLoss; the model parameter selection on the validation set is made using accuracy or one of the three loss functions, each considered independently. Three main findings emerge. (1) Early stopping based on validation accuracy performs worst, consistently selecting checkpoints with lower test accuracy than both loss-based early stopping and post-hoc selection. (2) Loss-based validation criteria yield comparable and more stable test accuracy. (3) Across datasets and folds, any singl...