[2603.22301] Latent Semantic Manifolds in Large Language Models
About this article
Abstract page for arXiv paper 2603.22301: Latent Semantic Manifolds in Large Language Models
Computer Science > Machine Learning arXiv:2603.22301 (cs) [Submitted on 17 Mar 2026] Title:Latent Semantic Manifolds in Large Language Models Authors:Mohamed A. Mabrok View a PDF of the paper titled Latent Semantic Manifolds in Large Language Models, by Mohamed A. Mabrok View PDF HTML (experimental) Abstract:Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce discrete tokens -- a fundamental mismatch whose geometric consequences remain poorly understood. We develop a mathematical framework that interprets LLM hidden states as points on a latent semantic manifold: a Riemannian submanifold equipped with the Fisher information metric, where tokens correspond to Voronoi regions partitioning the manifold. We define the expressibility gap, a geometric measure of the semantic distortion from vocabulary discretization, and prove two theorems: a rate-distortion lower bound on distortion for any finite vocabulary, and a linear volume scaling law for the expressibility gap via the coarea formula. We validate these predictions across six transformer architectures (124M-1.5B parameters), confirming universal hourglass intrinsic dimension profiles, smooth curvature structure, and linear gap scaling with slopes 0.87-1.12 (R^2 > 0.985). The margin distribution across models reveals a persistent hard core of boundary-proximal representations invariant to scale, providing a geometric decomposition of perplexity. We discuss implications for arch...