[2603.20327] Probing the Latent World: Emergent Discrete Symbols and Physical Structure in Latent Representations
About this article
Abstract page for arXiv paper 2603.20327: Probing the Latent World: Emergent Discrete Symbols and Physical Structure in Latent Representations
Computer Science > Machine Learning arXiv:2603.20327 (cs) [Submitted on 20 Mar 2026] Title:Probing the Latent World: Emergent Discrete Symbols and Physical Structure in Latent Representations Authors:Liu hung ming View a PDF of the paper titled Probing the Latent World: Emergent Discrete Symbols and Physical Structure in Latent Representations, by Liu hung ming View PDF HTML (experimental) Abstract:Video world models trained with Joint Embedding Predictive Architectures (JEPA) acquire rich spatiotemporal representations by predicting masked regions in latent space rather than reconstructing pixels. This removes the visual verification pathway of generative models, creating a structural interpretability gap: the encoder has learned physical structure inaccessible in any inspectable form. Existing probing methods either operate in continuous space without a structured intermediate layer, or attach generative components whose parameters confound attribution of behavior to the encoder. We propose the AI Mother Tongue (AIM) framework as a passive quantization probe: a lightweight, vocabulary-free probe that converts V-JEPA 2 continuous latent vectors into discrete symbol sequences without task-specific supervision or modifying the encoder. Because the encoder is kept completely frozen, any symbolic structure in the AIM codebook is attributable entirely to V-JEPA 2 pre-trained representations -- not to the probe. We evaluate through category-contrast experiments on Kinetics-mini...