[2603.23055] Post-Selection Distributional Model Evaluation
About this article
Abstract page for arXiv paper 2603.23055: Post-Selection Distributional Model Evaluation
Statistics > Machine Learning arXiv:2603.23055 (stat) [Submitted on 24 Mar 2026] Title:Post-Selection Distributional Model Evaluation Authors:Amirmohammad Farzaneh, Osvaldo Simeone View a PDF of the paper titled Post-Selection Distributional Model Evaluation, by Amirmohammad Farzaneh and 1 other authors View PDF HTML (experimental) Abstract:Formal model evaluation methods typically certify that a model satisfies a prescribed target key performance indicator (KPI) level. However, in many applications, the relevant target KPI level may not be known a priori, and the user may instead wish to compare candidate models by analyzing the full trade-offs between performance and reliability achievable at test time by the models. This task, requiring the reliable estimate of the test-time KPI distributions, is made more complicated by the fact that the same data must often be used both to pre-select a subset of candidate models and to estimate their KPI distributions, causing a potential post-selection bias. In this work, we introduce post-selection distributional model evaluation (PS-DME), a general framework for statistically valid distributional model assessment after arbitrary data-dependent model pre-selection. Building on e-values, PS-DME controls post-selection false coverage rate (FCR) for the distributional KPI estimates and is proved to be more sample efficient than a baseline method based on sample splitting. Experiments on synthetic data, text-to-SQL decoding with large l...