[2510.02823] The Curious Case of In-Training Compression of State Space Models
Summary
This paper explores in-training compression techniques for State Space Models (SSMs), demonstrating how selective dimension preservation during training can enhance computational efficiency and model performance.
Why It Matters
The study addresses a significant challenge in machine learning: balancing model complexity with computational efficiency. By introducing a method that compresses models during training, it opens avenues for faster optimization without sacrificing performance, which is crucial for applications requiring real-time processing.
Key Takeaways
- In-training compression can significantly accelerate optimization of State Space Models.
- The proposed method, CompreSSM, preserves task-critical structures while reducing model dimensions.
- Maintaining high expressivity in compressed models leads to better performance compared to models trained at smaller dimensions.
Computer Science > Machine Learning arXiv:2510.02823 (cs) [Submitted on 3 Oct 2025 (v1), last revised 24 Feb 2026 (this version, v4)] Title:The Curious Case of In-Training Compression of State Space Models Authors:Makram Chahine, Philipp Nazari, Daniela Rus, T. Konstantin Rusch View a PDF of the paper titled The Curious Case of In-Training Compression of State Space Models, by Makram Chahine and 3 other authors View PDF HTML (experimental) Abstract:State Space Models (SSMs), developed to tackle long sequence modeling tasks efficiently, offer both parallelizable training and fast inference. At their core are recurrent dynamical systems that maintain a hidden state, with update costs scaling with the state dimension. A key design challenge is striking the right balance between maximizing expressivity and limiting this computational burden. Control theory, and more specifically Hankel singular value analysis, provides a potent framework for the measure of energy for each state, as well as the balanced truncation of the original system down to a smaller representation with performance guarantees. Leveraging the eigenvalue stability properties of Hankel matrices, we apply this lens to SSMs \emph{during training}, where only dimensions of high influence are identified and preserved. Our approach, \textsc{CompreSSM}, applies to Linear Time-Invariant SSMs such as Linear Recurrent Units, but is also extendable to selective models. Experiments show that in-training reduction signifi...