[2603.05276] Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts
About this article
Abstract page for arXiv paper 2603.05276: Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts
Computer Science > Machine Learning arXiv:2603.05276 (cs) [Submitted on 5 Mar 2026] Title:Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts Authors:Samandar Samandarov, Nazirjon Ismoiljonov, Abdullah Sattorov, Temirlan Sabyrbayev View a PDF of the paper titled Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts, by Samandar Samandarov and 3 other authors View PDF HTML (experimental) Abstract:In the landscape of modern machine learning, frozen pre-trained models provide stability and efficiency but often underperform on specific tasks due to mismatched data distributions. This paper introduces the Whisperer, a novel visual prompting framework that learns diffusion-based preprocessors to adapt inputs in pixel space, effectively "whispering" enhancements to frozen downstream models like EasyOCR. By framing the process as behavioral cloning of stochastically discovered improvement policies, our method achieves an 8% absolute (10.6% relative) reduction in Character Error Rate (CER) on a challenging dataset of 300k degraded synthetic text images, surpassing hand-engineered baselines such as CLAHE. The key innovation is a four-stage training curriculum that uses behavioral cloning to amplify "lucky" improvements discovered through the stochastic exploration of a partially trained diffusion model. This approach is highly sample-efficient and avoids the pitfalls of traditional reinforcement learning. Crucially, we frame this not as naive rein...