[2511.10983] Binary Verification for Zero-Shot Vision
About this article
Abstract page for arXiv paper 2511.10983: Binary Verification for Zero-Shot Vision
Computer Science > Computer Vision and Pattern Recognition arXiv:2511.10983 (cs) [Submitted on 14 Nov 2025 (v1), last revised 27 Mar 2026 (this version, v2)] Title:Binary Verification for Zero-Shot Vision Authors:Rongbin Hu, Jeffrey Liu View a PDF of the paper titled Binary Verification for Zero-Shot Vision, by Rongbin Hu and Jeffrey Liu View PDF HTML (experimental) Abstract:We propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs. It comprises two steps: (i) quantization, which turns the open-ended query into a multiple-choice question (MCQ) with a small, explicit list of unambiguous candidates; and (ii) binarization, which asks one True/False question per candidate and resolves deterministically: if exactly one is True, select it; otherwise, revert to an MCQ over the remaining plausible candidates. We evaluate the workflow on referring expression grounding (REC), spatial reasoning (Spatial-Map, Spatial-Grid, Spatial-Maze), and BLINK-Jigsaw. Relative to answering open-ended queries directly, quantization to MCQ yields large gains, and True/False binarization provides a consistent additional boost. Across all tasks, the same workflow produces significant improvements, indicating generality. We further integrate the proposed REC workflow into a real-world video processing and editing system, and present the system architecture and end-to-end pipeline in the paper. Together, these components yield a simple and unified workflow tha...