[2510.01349] To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking

[2510.01349] To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2510.01349: To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking

Computer Science > Machine Learning arXiv:2510.01349 (cs) [Submitted on 1 Oct 2025 (v1), last revised 30 Mar 2026 (this version, v2)] Title:To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking Authors:Hannah Lawrence, Elyssa Hofgard, Vasco Portilheiro, Yuxuan Chen, Tess Smidt, Robin Walters View a PDF of the paper titled To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking, by Hannah Lawrence and 5 other authors View PDF HTML (experimental) Abstract:Symmetry-aware methods for machine learning, such as data augmentation and equivariant architectures, encourage correct model behavior on all transformations (e.g. rotations or permutations) of the original dataset. These methods can improve generalization and sample efficiency, under the assumption that the transformed datapoints are highly probable, or "important", under the test distribution. In this work, we develop a method for critically evaluating this assumption. In particular, we propose a metric to quantify the amount of symmetry breaking in a dataset, via a two-sample classifier test that distinguishes between the original dataset and its randomly augmented equivalent. We validate our metric on synthetic datasets, and then use it to uncover surprisingly high degrees of symmetry-breaking in several benchmark point cloud datasets, constituting a severe form of dataset bias. We show theoretically that distributional symmetry-breaking can prevent invariant methods from performing...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

Machine Learning

Is it actually possible to build a model-agnostic persistent text layer that keeps AI behavior stable?

Is it actually possible to define a persistent, model-agnostic text-based layer (loaded with the model each time) that keeps an AI system...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Hey everyone, I’m an AI news curator and editor currently working on a piece about a weird trend I’ve been spotting: technical simulators...

Reddit - Machine Learning · 1 min ·
Machine Learning

Coherence Without Convergence: A New Protocol for Multi-Agent AI

Opening For the past year, most progress in multi-agent AI has followed a familiar pattern: Add more agents. Add more coordination. Watch...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Week 6 AIPass update - answering the top questions from last post (file conflicts, remote models, scale)

Followup to last post with answers to the top questions from the comments. Appreciate everyone who jumped in. The most common one by a mi...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime