[2602.22617] Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA
Summary
The paper introduces Semantic Tube Prediction (STP), a method that enhances data efficiency in large language models (LLMs) by constraining hidden-state trajectories, allowing models to achieve baseline accuracy with significantly less training data.
Why It Matters
This research challenges existing scaling laws in LLMs by demonstrating that geometric priors can lead to improved data efficiency. As data costs rise, finding methods to optimize training processes is crucial for advancing AI capabilities without requiring massive datasets.
Key Takeaways
- STP allows LLMs to maintain accuracy with 16x less training data.
- The Geodesic Hypothesis posits that token sequences follow geodesics on a semantic manifold.
- STP improves signal-to-noise ratio and preserves diversity during inference.
- The method challenges traditional data-efficiency bounds in LLM training.
- Code for the proposed method is publicly available for further research.
Computer Science > Machine Learning arXiv:2602.22617 (cs) [Submitted on 26 Feb 2026] Title:Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA Authors:Hai Huang, Yann LeCun, Randall Balestriero View a PDF of the paper titled Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA, by Hai Huang and 2 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) obey consistent scaling laws -- empirical power-law fits that predict how loss decreases with compute, data, and parameters. While predictive, these laws are descriptive rather than prescriptive: they characterize typical training, not optimal training. Surprisingly few works have successfully challenged the data-efficiency bounds implied by these laws -- which is our primary focus. To that end, we introduce the Geodesic Hypothesis, positing that token sequences trace geodesics on a smooth semantic manifold and are therefore locally linear. Building on this principle, we propose a novel Semantic Tube Prediction (STP) task, a JEPA-style regularizer that confines hidden-state trajectories to a tubular neighborhood of the geodesic. STP generalizes JEPA to language without requiring explicit multi-view augmentations. We show this constraint improves signal-to-noise ratio, and consequently preserves diversity by preventing trajectory collisions during inference. Empirically, STP allows LLMs to match baseline accuracy with 16$\times$ less training data on the NL-RX-SYNTH dataset, dire...