[2603.25075] Sparse Visual Thought Circuits in Vision-Language Models
About this article
Abstract page for arXiv paper 2603.25075: Sparse Visual Thought Circuits in Vision-Language Models
Computer Science > Artificial Intelligence arXiv:2603.25075 (cs) [Submitted on 26 Mar 2026] Title:Sparse Visual Thought Circuits in Vision-Language Models Authors:Yunpeng Zhou View a PDF of the paper titled Sparse Visual Thought Circuits in Vision-Language Models, by Yunpeng Zhou View PDF HTML (experimental) Abstract:Sparse autoencoders (SAEs) improve interpretability in multimodal models, but it remains unclear whether SAE features form modular, composable units for reasoning-an assumption underlying many intervention-based steering methods. We test this modularity hypothesis and find it often fails: intervening on a task-selective feature set can modestly improve reasoning accuracy, while intervening on the union of two such sets reliably induces output drift (large unintended changes in predictions) and degrades accuracy, even under norm-matched perturbations. This non modular circuit interference is consistent with shared internal pathways where feature unions amplify activation shifts. We develop a reproducible causal pipeline to localize and test these sparse visual thought circuits in Qwen3-VL-8B. On a controlled synthetic benchmark with seven task types and three difficulty levels, linear probes identify a mid decoder locus for task type information. We train SAEs at this layer, construct task-selective sets via an explicit rule, and perform inference time scaling and ablation while quantifying accuracy and drift. Our findings-validated with bootstrapped subsamples...