[2603.05276] Whispering to a Blackbox: Bootstrapping Frozen OCR with

[2603.05276] Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts

arXiv - Machine Learning March 06, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.05276: Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts

Computer Science > Machine Learning arXiv:2603.05276 (cs) [Submitted on 5 Mar 2026] Title:Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts Authors:Samandar Samandarov, Nazirjon Ismoiljonov, Abdullah Sattorov, Temirlan Sabyrbayev View a PDF of the paper titled Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts, by Samandar Samandarov and 3 other authors View PDF HTML (experimental) Abstract:In the landscape of modern machine learning, frozen pre-trained models provide stability and efficiency but often underperform on specific tasks due to mismatched data distributions. This paper introduces the Whisperer, a novel visual prompting framework that learns diffusion-based preprocessors to adapt inputs in pixel space, effectively "whispering" enhancements to frozen downstream models like EasyOCR. By framing the process as behavioral cloning of stochastically discovered improvement policies, our method achieves an 8% absolute (10.6% relative) reduction in Character Error Rate (CER) on a challenging dataset of 300k degraded synthetic text images, surpassing hand-engineered baselines such as CLAHE. The key innovation is a four-stage training curriculum that uses behavioral cloning to amplify "lucky" improvements discovered through the stochastic exploration of a partially trained diffusion model. This approach is highly sample-efficient and avoids the pitfalls of traditional reinforcement learning. Crucially, we frame this not as naive rein...

Originally published on March 06, 2026. Curated by AI News.

Machine Learning

I have question for people who got job

how you guys getting job in ml as a fresher ?? I am in college. havent started learning ml but willing to . let me know exactly how to do...

Reddit - ML Jobs · 1 min · about 1 hour ago

Llms

🤖 AI News Digest - March 27, 2026

Today's AI news: 1. My minute-by-minute response to the LiteLLM malware attack The article describes a detailed, minute-by-minute respons...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

I have a problem statement where we are supposed to detect the attention level of student in a classroom, basically output whether he is ...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do fronti...

Reddit - Machine Learning · 1 min · about 3 hours ago

[2603.05276] Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts

About this article

Related Articles

I have question for people who got job

🤖 AI News Digest - March 27, 2026

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

No comments

Stay updated with AI News