[2602.13738] OneLatent: Single-Token Compression for Visual Latent Reasoning
Summary
The paper introduces OneLatent, a framework that compresses reasoning in visual tasks into a single token, significantly reducing output length while maintaining accuracy.
Why It Matters
OneLatent addresses the high inference costs associated with chain-of-thought prompting in AI, offering a more efficient method for visual reasoning that can enhance performance in resource-constrained environments. This innovation could lead to broader applications in AI systems requiring efficient reasoning capabilities.
Key Takeaways
- OneLatent compresses reasoning into a single latent token, improving efficiency.
- The framework reduces output length by 11 times with minimal accuracy loss (2.21%).
- Achieves high performance on benchmarks like ProntoQA (99.80%) and ProsQA (97.80%).
- Supports compression-constrained generalization, making it suitable for various applications.
- Utilizes rendered CoT images for deterministic supervision, enhancing auditability.
Computer Science > Artificial Intelligence arXiv:2602.13738 (cs) [Submitted on 14 Feb 2026] Title:OneLatent: Single-Token Compression for Visual Latent Reasoning Authors:Bo Lv, Yasheng Sun, Junjie Wang, Haoxiang Shi View a PDF of the paper titled OneLatent: Single-Token Compression for Visual Latent Reasoning, by Bo Lv and 3 other authors View PDF HTML (experimental) Abstract:Chain-of-thought (CoT) prompting improves reasoning but often increases inference cost by one to two orders of magnitude. To address these challenges, we present \textbf{OneLatent}, a framework that compresses intermediate reasoning into a single latent token via supervision from rendered CoT images and DeepSeek-OCR hidden states. By rendering textual steps into images, we obtain a deterministic supervision signal that can be inspected and audited without requiring the model to output verbose textual rationales. Across benchmarks, OneLatent reduces average output length by $11\times$ with only a $2.21\%$ average accuracy drop relative to textual CoT, while improving output token contribution (OTC) by $6.8\times$. On long-chain logical reasoning, OneLatent reaches $99.80\%$ on ProntoQA and $97.80\%$ on ProsQA with one latent token, with compression up to $87.4\times$, supporting compression-constrained generalization. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.13738 [cs.AI] (or arXiv:2602.13738v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.13738 Focus to learn m...