[2604.06912] Q-Zoom: Query-Aware Adaptive Perception for Efficient

[2604.06912] Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

arXiv - AI April 09, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.06912: Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

Computer Science > Computer Vision and Pattern Recognition arXiv:2604.06912 (cs) [Submitted on 8 Apr 2026] Title:Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models Authors:Yuheng Shi, Xiaohuan Pei, Linfeng Wen, Minjing Dong, Chang Xu View a PDF of the paper titled Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models, by Yuheng Shi and 4 other authors View PDF HTML (experimental) Abstract:MLLMs require high-resolution visual inputs for fine-grained tasks like document understanding and dense scene perception. However, current global resolution scaling paradigms indiscriminately flood the quadratic self-attention mechanism with visually redundant tokens, severely bottlenecking inference throughput while ignoring spatial sparsity and query intent. To overcome this, we propose Q-Zoom, a query-aware adaptive high-resolution perception framework that operates in an efficient coarse-to-fine manner. First, a lightweight Dynamic Gating Network safely bypasses high-resolution processing when coarse global features suffice. Second, for queries demanding fine-grained perception, a Self-Distilled Region Proposal Network (SD-RPN) precisely localizes the task-relevant Region-of-Interest (RoI) directly from intermediate feature spaces. To optimize these modules efficiently, the gating network uses a consistency-aware generation strategy to derive deterministic routing labels, while the SD-RPN employs a fully self-superv...

Originally published on April 09, 2026. Curated by AI News.

Llms

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

Thanks to Musk v. Altman, the public is getting a concrete look at details of Sam Altman’s ouster from OpenAI, much of it centered on for...

The Verge - AI · 11 min · 22 minutes ago

Llms

Diffusion for generating/editing ASTs? [D]

I’m not a machine learning expert or anything, but I do enjoy learning about how it all works. I’ve noticed that one of the main limitati...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

ChatGPT’s ‘Trusted Contact’ will alert loved ones of safety concerns | The Verge

OpenAI is launching an optional safety feature for ChatGPT that allows adult users to assign an emergency contact for mental health and s...

The Verge - AI · 4 min · about 2 hours ago

Llms

AI is helpful but still not “there” yet

what I mean is that every time I use Claude, or Grok or any of the AI platforms and tools, I realize how far this technology is from repl...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

[2604.06912] Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

About this article

Related Articles

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

Diffusion for generating/editing ASTs? [D]

ChatGPT’s ‘Trusted Contact’ will alert loved ones of safety concerns | The Verge

AI is helpful but still not “there” yet

No comments

Stay updated with AI News