[2603.26349] Generative Score Inference for Multimodal Data
About this article
Abstract page for arXiv paper 2603.26349: Generative Score Inference for Multimodal Data
Statistics > Machine Learning arXiv:2603.26349 (stat) [Submitted on 27 Mar 2026] Title:Generative Score Inference for Multimodal Data Authors:Xinyu Tian, Xiaotong Shen View a PDF of the paper titled Generative Score Inference for Multimodal Data, by Xinyu Tian and Xiaotong Shen View PDF HTML (experimental) Abstract:Accurate uncertainty quantification is crucial for making reliable decisions in various supervised learning scenarios, particularly when dealing with complex, multimodal data such as images and text. Current approaches often face notable limitations, including rigid assumptions and limited generalizability, constraining their effectiveness across diverse supervised learning tasks. To overcome these limitations, we introduce Generative Score Inference (GSI), a flexible inference framework capable of constructing statistically valid and informative prediction and confidence sets across a wide range of multimodal learning problems. GSI utilizes synthetic samples generated by deep generative models to approximate conditional score distributions, facilitating precise uncertainty quantification without imposing restrictive assumptions about the data or tasks. We empirically validate GSI's capabilities through two representative scenarios: hallucination detection in large language models and uncertainty estimation in image captioning. Our method achieves state-of-the-art performance in hallucination detection and robust predictive uncertainty in image captioning, and i...