[2604.00208] Measuring the Representational Alignment of Neural Systems in Superposition
About this article
Abstract page for arXiv paper 2604.00208: Measuring the Representational Alignment of Neural Systems in Superposition
Computer Science > Machine Learning arXiv:2604.00208 (cs) [Submitted on 31 Mar 2026] Title:Measuring the Representational Alignment of Neural Systems in Superposition Authors:Sunny Liu, Habon Issa, André Longon, Liv Gorton, Meenakshi Khosla, David Klindt View a PDF of the paper titled Measuring the Representational Alignment of Neural Systems in Superposition, by Sunny Liu and 5 other authors View PDF HTML (experimental) Abstract:Comparing the internal representations of neural networks is a central goal in both neuroscience and machine learning. Standard alignment metrics operate on raw neural activations, implicitly assuming that similar representations produce similar activity patterns. However, neural systems frequently operate in superposition, encoding more features than they have neurons via linear compression. We derive closed-form expressions showing that superposition systematically deflates Representational Similarity Analysis, Centered Kernel Alignment, and linear regression, causing networks with identical feature content to appear dissimilar. The root cause is that these metrics are dependent on cross-similarity between two systems' respective superposition matrices, which under assumption of random projection usually differ significantly, not on the latent features themselves: alignment scores conflate what a system represents with how it represents it. Under partial feature overlap, this confound can invert the expected ordering, making systems sharing fewe...