[2509.17874] Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models
About this article
Abstract page for arXiv paper 2509.17874: Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models
Computer Science > Machine Learning arXiv:2509.17874 (cs) [Submitted on 22 Sep 2025 (v1), last revised 4 Mar 2026 (this version, v2)] Title:Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models Authors:Paulius Rauba, Mihaela van der Schaar View a PDF of the paper titled Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models, by Paulius Rauba and 1 other authors View PDF HTML (experimental) Abstract:Large neural networks are typically trained for a fixed computational budget, creating a rigid trade-off between performance and efficiency that is ill-suited for deployment in resource-constrained or dynamic environments. Existing approaches to this problem present a difficult choice: training a discrete collection of specialist models is computationally prohibitive, while dynamic methods like slimmable networks often lack the flexibility to be applied to large, pre-trained foundation models. In this work, we propose Nested Subspace Networks (NSNs), a novel architectural paradigm that enables a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets at inference time. The core of our approach is to re-parameterize linear layers to satisfy a nested subspace property, such that the function computed at a given rank is a strict subspace of the function at any higher rank. We show that this entire hierarchy of models can be optimized jointly via an uncertainty-aware objective tha...