[2310.01770] A simple connection from loss flatness to compressed neural representations
Summary
This article explores the relationship between loss flatness and compressed neural representations, introducing new measures and empirical validations across various neural network architectures.
Why It Matters
Understanding the connection between loss flatness and representation compression is crucial for improving neural network performance. This research provides insights that could help resolve ongoing debates in the field regarding sharpness and generalization, potentially leading to better model training strategies.
Key Takeaways
- Sharpness of loss minima is linked to the geometric structure of neural representations.
- Three new measures (LVR, MLS, Local Dimensionality) are introduced to quantify representation compression.
- Flatter minima are shown to limit representation compression, providing a new perspective on model performance.
- Empirical validation across various architectures supports the theoretical findings.
- The research offers a resolution to conflicting views on the sharpness-generalization relationship.
Computer Science > Machine Learning arXiv:2310.01770 (cs) [Submitted on 3 Oct 2023 (v1), last revised 22 Feb 2026 (this version, v5)] Title:A simple connection from loss flatness to compressed neural representations Authors:Shirui Chen, Stefano Recanatesi, Eric Shea-Brown View a PDF of the paper titled A simple connection from loss flatness to compressed neural representations, by Shirui Chen and 2 other authors View PDF HTML (experimental) Abstract:Despite extensive study, the significance of sharpness -- the trace of the loss Hessian at local minima -- remains unclear. We investigate an alternative perspective: how sharpness relates to the geometric structure of neural representations, specifically representation compression, defined as how strongly neural activations concentrate under local input perturbations. We introduce three measures -- Local Volumetric Ratio (LVR), Maximum Local Sensitivity (MLS), and Local Dimensionality -- and derive upper bounds showing these are mathematically constrained by sharpness: flatter minima necessarily limit compression. We extend these bounds to reparametrization-invariant sharpness and introduce network-wide variants (NMLS, NVR) that provide tighter, more stable bounds than prior single-layer analyses. Empirically, we validate consistent positive correlations across feedforward, convolutional, and transformer architectures. Our results suggest that sharpness fundamentally quantifies representation compression, offering a principled...