[2509.23365] Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
About this article
Abstract page for arXiv paper 2509.23365: Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
Computer Science > Machine Learning arXiv:2509.23365 (cs) [Submitted on 27 Sep 2025 (v1), last revised 1 Mar 2026 (this version, v3)] Title:Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought Authors:Hanlin Zhu, Shibo Hao, Zhiting Hu, Jiantao Jiao, Stuart Russell, Yuandong Tian View a PDF of the paper titled Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought, by Hanlin Zhu and 5 other authors View PDF HTML (experimental) Abstract:Previous work shows that the chain of continuous thought (continuous CoT) improves the reasoning capability of large language models (LLMs) by enabling implicit parallel thinking, and a subsequent work provided theoretical insight by showing that a two-layer transformer equipped with continuous CoT can efficiently solve directed graph reachability by maintaining a superposition of multiple reasoning traces in the continuous thought. However, it remains unclear how the superposition mechanism is naturally learned from gradient-based training methods. To fill this gap, we theoretically analyze the training dynamics of a simplified two-layer transformer on the directed graph reachability problem to unveil how the superposition mechanism emerges during training in two training stages -- (i) a thought-generation stage that autoregressively expands the continuous thought, and (ii) a prediction stage that converts the thought into the final answer. Our analysis reveals that du...