[2510.01349] To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking
About this article
Abstract page for arXiv paper 2510.01349: To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking
Computer Science > Machine Learning arXiv:2510.01349 (cs) [Submitted on 1 Oct 2025 (v1), last revised 30 Mar 2026 (this version, v2)] Title:To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking Authors:Hannah Lawrence, Elyssa Hofgard, Vasco Portilheiro, Yuxuan Chen, Tess Smidt, Robin Walters View a PDF of the paper titled To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking, by Hannah Lawrence and 5 other authors View PDF HTML (experimental) Abstract:Symmetry-aware methods for machine learning, such as data augmentation and equivariant architectures, encourage correct model behavior on all transformations (e.g. rotations or permutations) of the original dataset. These methods can improve generalization and sample efficiency, under the assumption that the transformed datapoints are highly probable, or "important", under the test distribution. In this work, we develop a method for critically evaluating this assumption. In particular, we propose a metric to quantify the amount of symmetry breaking in a dataset, via a two-sample classifier test that distinguishes between the original dataset and its randomly augmented equivalent. We validate our metric on synthetic datasets, and then use it to uncover surprisingly high degrees of symmetry-breaking in several benchmark point cloud datasets, constituting a severe form of dataset bias. We show theoretically that distributional symmetry-breaking can prevent invariant methods from performing...