[2602.13738] OneLatent: Single-Token Compression for Visual Latent Reasoning

[2602.13738] OneLatent: Single-Token Compression for Visual Latent Reasoning

arXiv - AI 3 min read Article

Summary

The paper introduces OneLatent, a framework that compresses reasoning in visual tasks into a single token, significantly reducing output length while maintaining accuracy.

Why It Matters

OneLatent addresses the high inference costs associated with chain-of-thought prompting in AI, offering a more efficient method for visual reasoning that can enhance performance in resource-constrained environments. This innovation could lead to broader applications in AI systems requiring efficient reasoning capabilities.

Key Takeaways

  • OneLatent compresses reasoning into a single latent token, improving efficiency.
  • The framework reduces output length by 11 times with minimal accuracy loss (2.21%).
  • Achieves high performance on benchmarks like ProntoQA (99.80%) and ProsQA (97.80%).
  • Supports compression-constrained generalization, making it suitable for various applications.
  • Utilizes rendered CoT images for deterministic supervision, enhancing auditability.

Computer Science > Artificial Intelligence arXiv:2602.13738 (cs) [Submitted on 14 Feb 2026] Title:OneLatent: Single-Token Compression for Visual Latent Reasoning Authors:Bo Lv, Yasheng Sun, Junjie Wang, Haoxiang Shi View a PDF of the paper titled OneLatent: Single-Token Compression for Visual Latent Reasoning, by Bo Lv and 3 other authors View PDF HTML (experimental) Abstract:Chain-of-thought (CoT) prompting improves reasoning but often increases inference cost by one to two orders of magnitude. To address these challenges, we present \textbf{OneLatent}, a framework that compresses intermediate reasoning into a single latent token via supervision from rendered CoT images and DeepSeek-OCR hidden states. By rendering textual steps into images, we obtain a deterministic supervision signal that can be inspected and audited without requiring the model to output verbose textual rationales. Across benchmarks, OneLatent reduces average output length by $11\times$ with only a $2.21\%$ average accuracy drop relative to textual CoT, while improving output token contribution (OTC) by $6.8\times$. On long-chain logical reasoning, OneLatent reaches $99.80\%$ on ProntoQA and $97.80\%$ on ProsQA with one latent token, with compression up to $87.4\times$, supporting compression-constrained generalization. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.13738 [cs.AI]   (or arXiv:2602.13738v1 [cs.AI] for this version)   https://doi.org/10.48550/arXiv.2602.13738 Focus to learn m...

Related Articles

Machine Learning

[R] Fine-tuning services report

If you have some data and want to train or run a small custom model but don't have powerful enough hardware for training, fine-tuning ser...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

Hello, everyone! This is my first time posting here and I apologise if the question is, perhaps, a bit too basic for this sub-reddit. A b...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

A week ago I made a thread asking whether ICML 2026’s review policy might have affected review outcomes, especially whether Policy A pape...

Reddit - Machine Learning · 1 min ·
Nomadic raises $8.4 million to wrangle the data pouring off autonomous vehicles | TechCrunch
Machine Learning

Nomadic raises $8.4 million to wrangle the data pouring off autonomous vehicles | TechCrunch

The company turns footage from robots into structured, searchable datasets with a deep learning model.

TechCrunch - AI · 6 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime