Machine Learning Computer Vision Ai Safety

[2512.19941] Block-Recurrent Dynamics in Vision Transformers

arXiv - Machine Learning February 20, 2026 4 min read Article

Summary

This article introduces the Block-Recurrent Hypothesis (BRH) for Vision Transformers, proposing a new framework for understanding their computational dynamics through a recurrent structure.

Why It Matters

As Vision Transformers become increasingly prevalent in computer vision tasks, understanding their internal mechanisms is crucial for improving model efficiency and interpretability. The BRH offers a novel perspective that could enhance the design and application of these models, making it relevant for researchers and practitioners in AI and machine learning.

Key Takeaways

The Block-Recurrent Hypothesis suggests that Vision Transformers can be understood through a block-recurrent structure.
Empirical evidence shows that this structure can effectively reduce complexity while maintaining performance.
The study introduces a new model, Raptor, which demonstrates the effectiveness of the BRH in achieving high accuracy with fewer blocks.

Computer Science > Computer Vision and Pattern Recognition arXiv:2512.19941 (cs) [Submitted on 23 Dec 2025 (v1), last revised 19 Feb 2026 (this version, v5)] Title:Block-Recurrent Dynamics in Vision Transformers Authors:Mozes Jacobs, Thomas Fel, Richard Hakim, Alessandra Brondetta, Demba Ba, T. Andy Keller View a PDF of the paper titled Block-Recurrent Dynamics in Vision Transformers, by Mozes Jacobs and 5 other authors View PDF HTML (experimental) Abstract:As Vision Transformers (ViTs) become standard vision backbones, a mechanistic account of their computational phenomenology is essential. Despite architectural cues that hint at dynamical structure, there is no settled framework that interprets Transformer depth as a well-characterized flow. In this work, we introduce the Block-Recurrent Hypothesis (BRH), arguing that trained ViTs admit a block-recurrent depth structure such that the computation of the original $L$ blocks can be accurately rewritten using only $k \ll L$ distinct blocks applied recurrently. Across diverse ViTs, between-layer representational similarity matrices suggest few contiguous phases. To determine whether these phases reflect genuinely reusable computation, we train block-recurrent surrogates of pretrained ViTs: Recurrent Approximations to Phase-structured TransfORmers (Raptor). In small-scale, we demonstrate that stochastic depth and training promote recurrent structure and subsequently correlate with our ability to accurately fit Raptor. We then ...

Read Original Article

[2512.19941] Block-Recurrent Dynamics in Vision Transformers

Summary

Why It Matters

Key Takeaways

Related Articles

World models will be the next big thing, bye-bye LLMs

[D] Got my first offer after months of searching — below posted range, contract-to-hire, and worried it may pause my search. Do I take it?

[Research] AI training is bad, so I started an research

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

No comments

Stay updated with AI News