[2603.01042] Thoth: Mid-Training Bridges LLMs to Time Series Understanding
About this article
Abstract page for arXiv paper 2603.01042: Thoth: Mid-Training Bridges LLMs to Time Series Understanding
Computer Science > Computation and Language arXiv:2603.01042 (cs) [Submitted on 1 Mar 2026] Title:Thoth: Mid-Training Bridges LLMs to Time Series Understanding Authors:Jiafeng Lin, Yuxuan Wang, Jialong Wu, Huakun Luo, Zhongyi Pei, Jianmin Wang View a PDF of the paper titled Thoth: Mid-Training Bridges LLMs to Time Series Understanding, by Jiafeng Lin and 5 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) have demonstrated remarkable success in general-purpose reasoning. However, they still struggle to understand and reason about time series data, which limits their effectiveness in decision-making scenarios that depend on temporal dynamics. In this paper, we propose Thoth, the first family of mid-trained LLMs with general-purpose time series understanding capabilities. As a pivotal intermediate stage, mid-training achieves task- and domain-agnostic alignment between time series and natural language, for which we construct Book-of-Thoth, a high-quality, time-series-centric mid-training corpus. Book-of-Thoth enables both time-series-to-text and text-to-time-series generation, equipping LLMs with a foundational grasp of temporal patterns. To better evaluate advanced reasoning capabilities, we further present KnoTS, a novel benchmark of knowledge-intensive time series understanding, designed for joint reasoning over temporal patterns and domain knowledge. Extensive experiments demonstrate that mid-training with Book-of-Thoth enables Thoth to sig...