[2603.22492] Tiny Inference-Time Scaling with Latent Verifiers
About this article
Abstract page for arXiv paper 2603.22492: Tiny Inference-Time Scaling with Latent Verifiers
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.22492 (cs) [Submitted on 23 Mar 2026] Title:Tiny Inference-Time Scaling with Latent Verifiers Authors:Davide Bucciarelli, Evelyn Turri, Lorenzo Baraldi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara View a PDF of the paper titled Tiny Inference-Time Scaling with Latent Verifiers, by Davide Bucciarelli and 5 other authors View PDF HTML (experimental) Abstract:Inference-time scaling has emerged as an effective way to improve generative models at test time by using a verifier to score and select candidate outputs. A common choice is to employ Multimodal Large Language Models (MLLMs) as verifiers, which can improve performance but introduce substantial inference-time cost. Indeed, diffusion pipelines operate in an autoencoder latent space to reduce computation, yet MLLM verifiers still require decoding candidates to pixel space and re-encoding them into the visual embedding space, leading to redundant and costly operations. In this work, we propose Verifier on Hidden States (VHS), a verifier that operates directly on intermediate hidden representations of Diffusion Transformer (DiT) single-step generators. VHS analyzes generator features without decoding to pixel space, thereby reducing the per-candidate verification cost while improving or matching the performance of MLLM-based competitors. We show that, under tiny inference budgets with only a small number of candidates per prompt, VHS enables more effi...