[2410.05493] Transformers learn variable-order Markov chains in-context

[2410.05493] Transformers learn variable-order Markov chains in-context

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2410.05493: Transformers learn variable-order Markov chains in-context

Computer Science > Machine Learning arXiv:2410.05493 (cs) [Submitted on 7 Oct 2024 (v1), last revised 28 Mar 2026 (this version, v2)] Title:Transformers learn variable-order Markov chains in-context Authors:Ruida Zhou, Chao Tian, Suhas Diggavi View a PDF of the paper titled Transformers learn variable-order Markov chains in-context, by Ruida Zhou and 2 other authors View PDF HTML (experimental) Abstract:We study transformers' in-context learning of variable-length Markov chains (VOMCs), focusing on the finite-sample accuracy as the number of in-context examples increases. Compared to fixed-order Markov chains (FOMCs), learning VOMCs is substantially more challenging due to the additional structural learning component. The problem is naturally suited to a Bayesian formulation, where the context-tree weighting (CTW) algorithm, originally developed in the information theory community for universal data compression, provides an optimal solution. Empirically, we find that single-layer transformers fail to learn VOMCs in context, whereas transformers with two or more layers can succeed, with additional layers yielding modest but noticeable improvements. In contrast to prior results on FOMCs, attention-only networks appear insufficient for VOMCs. To explain these findings, we provide explicit transformer constructions: one with $D+2$ layers that can exactly implement CTW for VOMCs of maximum order $D$, and a simplified two-layer construction that uses partial information for appr...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

How Dangerous Is Anthropic’s New AI Model? Its Chief Science Officer Explains.
Machine Learning

How Dangerous Is Anthropic’s New AI Model? Its Chief Science Officer Explains.

Anthropic says Mythos is so dangerous that the company is slowing its release. We asked Jared Kaplan why.

AI Tools & Products · 3 min ·
Llms

Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]

I spent the few days building a benchmark that maps where frontier LLMs fall on a 2D political compass (economic left/right + social prog...

Reddit - Machine Learning · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime