[2510.02524] Unraveling Syntax: How Language Models Learn Context-Free Grammars
About this article
Abstract page for arXiv paper 2510.02524: Unraveling Syntax: How Language Models Learn Context-Free Grammars
Computer Science > Computation and Language arXiv:2510.02524 (cs) [Submitted on 2 Oct 2025 (v1), last revised 27 Feb 2026 (this version, v2)] Title:Unraveling Syntax: How Language Models Learn Context-Free Grammars Authors:Laura Ying Schulz, Daniel Mitropolsky, Tomaso Poggio View a PDF of the paper titled Unraveling Syntax: How Language Models Learn Context-Free Grammars, by Laura Ying Schulz and 2 other authors View PDF HTML (experimental) Abstract:While large models achieve impressive results, their learning dynamics are far from understood. Many domains of interest, such as natural language syntax, coding languages, arithmetic problems, are captured by context-free grammars (CFGs). In this work, we extend prior work on neural language modeling of CFGs in a novel direction: how language modeling behaves with respect to CFG substructure, namely "subgrammars". We first define subgrammars, and prove a set of fundamental theorems regarding language modeling and subgrammars. We show that language modeling loss (or equivalently the Kullback-Leibler divergence) recurses linearly over its top-level subgrammars; applied recursively, the loss decomposes into losses for "irreducible" subgrammars. We also prove that the constant in this linear recurrence is a function of the expected recursion, a notion we introduce. We show that under additional assumptions, parametrized models learn subgrammars in parallel. Empirically, we confirm that small transformers learn subgrammars in paral...