[2410.05493] Transformers learn variable-order Markov chains

[2410.05493] Transformers learn variable-order Markov chains in-context

arXiv - Machine Learning March 31, 2026 3 min read

About this article

Abstract page for arXiv paper 2410.05493: Transformers learn variable-order Markov chains in-context

Computer Science > Machine Learning arXiv:2410.05493 (cs) [Submitted on 7 Oct 2024 (v1), last revised 28 Mar 2026 (this version, v2)] Title:Transformers learn variable-order Markov chains in-context Authors:Ruida Zhou, Chao Tian, Suhas Diggavi View a PDF of the paper titled Transformers learn variable-order Markov chains in-context, by Ruida Zhou and 2 other authors View PDF HTML (experimental) Abstract:We study transformers' in-context learning of variable-length Markov chains (VOMCs), focusing on the finite-sample accuracy as the number of in-context examples increases. Compared to fixed-order Markov chains (FOMCs), learning VOMCs is substantially more challenging due to the additional structural learning component. The problem is naturally suited to a Bayesian formulation, where the context-tree weighting (CTW) algorithm, originally developed in the information theory community for universal data compression, provides an optimal solution. Empirically, we find that single-layer transformers fail to learn VOMCs in context, whereas transformers with two or more layers can succeed, with additional layers yielding modest but noticeable improvements. In contrast to prior results on FOMCs, attention-only networks appear insufficient for VOMCs. To explain these findings, we provide explicit transformer constructions: one with $D+2$ layers that can exactly implement CTW for VOMCs of maximum order $D$, and a simplified two-layer construction that uses partial information for appr...

Originally published on March 31, 2026. Curated by AI News.

Machine Learning

How Dangerous Is Anthropic’s New AI Model? Its Chief Science Officer Explains.

Anthropic says Mythos is so dangerous that the company is slowing its release. We asked Jared Kaplan why.

AI Tools & Products · 3 min · 1 minute ago

Llms

Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]

I spent the few days building a benchmark that maps where frontier LLMs fall on a 2D political compass (economic left/right + social prog...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 1 hour ago

Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min · about 1 hour ago

[2410.05493] Transformers learn variable-order Markov chains in-context

About this article

Related Articles

How Dangerous Is Anthropic’s New AI Model? Its Chief Science Officer Explains.

Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]

UMKC Announces New Master of Science in Artificial Intelligence

Improving AI models’ ability to explain their predictions

No comments

Stay updated with AI News