[2602.22719] Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks
Summary
This paper explores the interpretability and steerability of state-space models (SSMs) by identifying activation subspace bottlenecks and proposing a test-time steering intervention that enhances performance across various benchmarks.
Why It Matters
As state-space models gain traction in machine learning, understanding their inner workings is crucial for improving their performance and applicability. This research addresses a significant gap in interpretability and offers practical methods to enhance model performance without extensive tuning, which is vital for advancing AI applications.
Key Takeaways
- Identifying activation subspace bottlenecks can improve SSM performance.
- A proposed steering intervention enhances performance by an average of 8.27%.
- The research validates the significance of bottlenecks in model architecture.
- Stable-Mamba architecture shows promise for long-context performance gains.
- The findings contribute to the mechanistic interpretability of modern AI models.
Computer Science > Machine Learning arXiv:2602.22719 (cs) [Submitted on 26 Feb 2026] Title:Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks Authors:Vamshi Sunku Mohan, Kaustubh Gupta, Aneesha Das, Chandan Singh View a PDF of the paper titled Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks, by Vamshi Sunku Mohan and 3 other authors View PDF Abstract:State-space models (SSMs) have emerged as an efficient strategy for building powerful language models, avoiding the quadratic complexity of computing attention in transformers. Despite their promise, the interpretability and steerability of modern SSMs remain relatively underexplored. We take a major step in this direction by identifying activation subspace bottlenecks in the Mamba family of SSM models using tools from mechanistic interpretability. We then introduce a test-time steering intervention that simply multiplies the activations of the identified bottlenecks by a scalar. Across 5 SSMs and 6 diverse benchmarks, this intervention improves performance by an average of 8.27%, without requiring any task-specific tuning. Finally, we validate that the identified bottlenecks are indeed hindering performance by modifying them to yield an architecture we call Stable-Mamba, which achieves long-context performance gains when retrained from scratch. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.22719 [cs.LG] (or arXiv:2602.22719v1 [cs.LG] for this version) ...