[2310.01770] A simple connection from loss flatness to compressed neural representations

[2310.01770] A simple connection from loss flatness to compressed neural representations

arXiv - AI 4 min read Article

Summary

This article explores the relationship between loss flatness and compressed neural representations, introducing new measures and empirical validations across various neural network architectures.

Why It Matters

Understanding the connection between loss flatness and representation compression is crucial for improving neural network performance. This research provides insights that could help resolve ongoing debates in the field regarding sharpness and generalization, potentially leading to better model training strategies.

Key Takeaways

  • Sharpness of loss minima is linked to the geometric structure of neural representations.
  • Three new measures (LVR, MLS, Local Dimensionality) are introduced to quantify representation compression.
  • Flatter minima are shown to limit representation compression, providing a new perspective on model performance.
  • Empirical validation across various architectures supports the theoretical findings.
  • The research offers a resolution to conflicting views on the sharpness-generalization relationship.

Computer Science > Machine Learning arXiv:2310.01770 (cs) [Submitted on 3 Oct 2023 (v1), last revised 22 Feb 2026 (this version, v5)] Title:A simple connection from loss flatness to compressed neural representations Authors:Shirui Chen, Stefano Recanatesi, Eric Shea-Brown View a PDF of the paper titled A simple connection from loss flatness to compressed neural representations, by Shirui Chen and 2 other authors View PDF HTML (experimental) Abstract:Despite extensive study, the significance of sharpness -- the trace of the loss Hessian at local minima -- remains unclear. We investigate an alternative perspective: how sharpness relates to the geometric structure of neural representations, specifically representation compression, defined as how strongly neural activations concentrate under local input perturbations. We introduce three measures -- Local Volumetric Ratio (LVR), Maximum Local Sensitivity (MLS), and Local Dimensionality -- and derive upper bounds showing these are mathematically constrained by sharpness: flatter minima necessarily limit compression. We extend these bounds to reparametrization-invariant sharpness and introduce network-wide variants (NMLS, NVR) that provide tighter, more stable bounds than prior single-layer analyses. Empirically, we validate consistent positive correlations across feedforward, convolutional, and transformer architectures. Our results suggest that sharpness fundamentally quantifies representation compression, offering a principled...

Related Articles

Machine Learning

[P] MCGrad: fix calibration of your ML model in subgroups

Hi r/MachineLearning, We’re open-sourcing MCGrad, a Python package for multicalibration–developed and deployed in production at Meta. Thi...

Reddit - Machine Learning · 1 min ·
Machine Learning

Ml project user give dataset and I give best model [D] [P]

Tl,dr : suggest me a solution to create a ai ml project where user will give his dataset as input and the project should give best model ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML Reviewer Acknowledgement

Hi, I'm a little confused about ICML discussion period Does the period for reviewer acknowledging responses have already ended? One of th...

Reddit - Machine Learning · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime