[2602.14423] The geometry of invariant learning: an information-theoretic analysis of data augmentation and generalization
Summary
This article presents an information-theoretic framework analyzing the role of data augmentation in machine learning, focusing on its impact on generalization and invariance learning.
Why It Matters
Understanding the theoretical underpinnings of data augmentation is crucial for improving machine learning models. This analysis offers insights into how augmentation affects generalization, which can lead to better model performance and more robust applications in various AI fields.
Key Takeaways
- Introduces a framework linking data augmentation to generalization through mutual information.
- Identifies three key components affecting generalization: distributional divergence, stability, and sensitivity.
- Defines 'group diameter' as a control parameter that balances data fidelity and regularization.
Computer Science > Machine Learning arXiv:2602.14423 (cs) [Submitted on 16 Feb 2026] Title:The geometry of invariant learning: an information-theoretic analysis of data augmentation and generalization Authors:Abdelali Bouyahia, Frédéric LeBlanc, Mario Marchand View a PDF of the paper titled The geometry of invariant learning: an information-theoretic analysis of data augmentation and generalization, by Abdelali Bouyahia and Fr\'ed\'eric LeBlanc and Mario Marchand View PDF HTML (experimental) Abstract:Data augmentation is one of the most widely used techniques to improve generalization in modern machine learning, often justified by its ability to promote invariance to label-irrelevant transformations. However, its theoretical role remains only partially understood. In this work, we propose an information-theoretic framework that systematically accounts for the effect of augmentation on generalization and invariance learning. Our approach builds upon mutual information-based bounds, which relate the generalization gap to the amount of information a learning algorithm retains about its training data. We extend this framework by modeling the augmented distribution as a composition of the original data distribution with a distribution over transformations, which naturally induces an orbit-averaged loss function. Under mild sub-Gaussian assumptions on the loss function and the augmentation process, we derive a new generalization bound that decompose the expected generalization g...