[2603.03507] Solving adversarial examples requires solving exponential misalignment
About this article
Abstract page for arXiv paper 2603.03507: Solving adversarial examples requires solving exponential misalignment
Computer Science > Machine Learning arXiv:2603.03507 (cs) [Submitted on 3 Mar 2026] Title:Solving adversarial examples requires solving exponential misalignment Authors:Alessandro Salvatore, Stanislav Fort, Surya Ganguli View a PDF of the paper titled Solving adversarial examples requires solving exponential misalignment, by Alessandro Salvatore and 2 other authors View PDF HTML (experimental) Abstract:Adversarial attacks - input perturbations imperceptible to humans that fool neural networks - remain both a persistent failure mode in machine learning, and a phenomenon with mysterious origins. To shed light, we define and analyze a network's perceptual manifold (PM) for a class concept as the space of all inputs confidently assigned to that class by the network. We find, strikingly, that the dimensionalities of neural network PMs are orders of magnitude higher than those of natural human concepts. Since volume typically grows exponentially with dimension, this suggests exponential misalignment between machines and humans, with exponentially many inputs confidently assigned to concepts by machines but not humans. Furthermore, this provides a natural geometric hypothesis for the origin of adversarial examples: because a network's PM fills such a large region of input space, any input will be very close to any class concept's PM. Our hypothesis thus suggests that adversarial robustness cannot be attained without dimensional alignment of machine and human PMs, and therefore ma...