Machine Learning Computer Vision Ai Infrastructure Ai Agents

[2602.13738] OneLatent: Single-Token Compression for Visual Latent Reasoning

arXiv - AI February 17, 2026 3 min read Article

Summary

The paper introduces OneLatent, a framework that compresses reasoning in visual tasks into a single token, significantly reducing output length while maintaining accuracy.

Why It Matters

OneLatent addresses the high inference costs associated with chain-of-thought prompting in AI, offering a more efficient method for visual reasoning that can enhance performance in resource-constrained environments. This innovation could lead to broader applications in AI systems requiring efficient reasoning capabilities.

Key Takeaways

OneLatent compresses reasoning into a single latent token, improving efficiency.
The framework reduces output length by 11 times with minimal accuracy loss (2.21%).
Achieves high performance on benchmarks like ProntoQA (99.80%) and ProsQA (97.80%).
Supports compression-constrained generalization, making it suitable for various applications.
Utilizes rendered CoT images for deterministic supervision, enhancing auditability.

Computer Science > Artificial Intelligence arXiv:2602.13738 (cs) [Submitted on 14 Feb 2026] Title:OneLatent: Single-Token Compression for Visual Latent Reasoning Authors:Bo Lv, Yasheng Sun, Junjie Wang, Haoxiang Shi View a PDF of the paper titled OneLatent: Single-Token Compression for Visual Latent Reasoning, by Bo Lv and 3 other authors View PDF HTML (experimental) Abstract:Chain-of-thought (CoT) prompting improves reasoning but often increases inference cost by one to two orders of magnitude. To address these challenges, we present \textbf{OneLatent}, a framework that compresses intermediate reasoning into a single latent token via supervision from rendered CoT images and DeepSeek-OCR hidden states. By rendering textual steps into images, we obtain a deterministic supervision signal that can be inspected and audited without requiring the model to output verbose textual rationales. Across benchmarks, OneLatent reduces average output length by $11\times$ with only a $2.21\%$ average accuracy drop relative to textual CoT, while improving output token contribution (OTC) by $6.8\times$. On long-chain logical reasoning, OneLatent reaches $99.80\%$ on ProntoQA and $97.80\%$ on ProsQA with one latent token, with compression up to $87.4\times$, supporting compression-constrained generalization. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.13738 [cs.AI] (or arXiv:2602.13738v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.13738 Focus to learn m...

Read Original Article

[2602.13738] OneLatent: Single-Token Compression for Visual Latent Reasoning

Summary

Why It Matters

Key Takeaways

Related Articles

[R] Fine-tuning services report

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

Nomadic raises $8.4 million to wrangle the data pouring off autonomous vehicles | TechCrunch

No comments

Stay updated with AI News