[2603.26737] Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning

[2603.26737] Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.26737: Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.26737 (cs) [Submitted on 21 Mar 2026] Title:Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning Authors:Guangfu Guo, Xiaoqian Lu, Yue Feng, Mingming Sun View a PDF of the paper titled Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning, by Guangfu Guo and Xiaoqian Lu and Yue Feng and Mingming Sun View PDF HTML (experimental) Abstract:Current multimodal LLMs encode images as static visual prefixes and rely on text-based reasoning, lacking goal-driven and adaptive visual access. Inspired by human visual perception-where attention is selectively and sequentially shifted from the most informative regions to secondary cues-we propose Structural Sequential Visual CoT SSV-CoT. First, a question-relevant saliency map identifies and organizes key visual regions, explicitly modeling the spatial distribution of visual importance. Second, reasoning is performed following this discriminative order, inducing a curriculum-like semantic progression from primary to secondary cues. This method is trained end-to-end, using text cot and answer supervision, without relying on region-level annotations or specialized external tools. Experiments on diverse visual reasoning benchmarks show gains, validating structured and sequential visual cognition. Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) Cite as: arXiv:2603.2673...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

Llms

Nvidia goes all-in on AI agents while Anthropic pulls the plug

TLDR: Nvidia is partnering with 17 major companies to build a platform specifically for enterprise AI agents, basically trying to become ...

Reddit - Artificial Intelligence · 1 min ·
Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage | TechCrunch
Llms

Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage | TechCrunch

It’s about to become more expensive for Claude Code subscribers to use Anthropic’s coding assistant with OpenClaw and other third-party t...

TechCrunch - AI · 4 min ·
Llms

I am seeing Claude everywhere

Every single Instagram reel or TikTok I scroll i see people mentioning Claude and glazing it like it’s some kind of master tool that’s be...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime