[2602.14814] Learning State-Tracking from Code Using Linear RNNs
Summary
This paper explores state-tracking in machine learning, specifically using linear RNNs for permutation composition tasks, highlighting their advantages over Transformers.
Why It Matters
Understanding how different neural network architectures perform in state-tracking tasks is crucial for advancing machine learning models. This research provides insights into the limitations of Transformers and the potential of linear RNNs, which could influence future model design and applications in AI.
Key Takeaways
- Linear RNNs excel in state-tracking tasks compared to Transformers.
- Permutation composition tasks reveal limitations in sequence models.
- State-tracking is complicated by non-observable actions.
- The study frames state-tracking as a probabilistic finite-state automaton.
- Linear RNNs can perform worse than non-linear RNNs in certain setups.
Computer Science > Machine Learning arXiv:2602.14814 (cs) [Submitted on 16 Feb 2026] Title:Learning State-Tracking from Code Using Linear RNNs Authors:Julien Siems, Riccardo Grazzi, Kirill Kalinin, Hitesh Ballani, Babak Rahmani View a PDF of the paper titled Learning State-Tracking from Code Using Linear RNNs, by Julien Siems and 4 other authors View PDF Abstract:Over the last years, state-tracking tasks, particularly permutation composition, have become a testbed to understand the limits of sequence models architectures like Transformers and RNNs (linear and non-linear). However, these are often sequence-to-sequence tasks: learning to map actions (permutations) to states, which is incompatible with the next-token prediction setting commonly used to train language models. We address this gap by converting permutation composition into code via REPL traces that interleave state-reveals through prints and variable transformations. We show that linear RNNs capable of state-tracking excel also in this setting, while Transformers still fail. Motivated by this representation, we investigate why tracking states in code is generally difficult: actions are not always fully observable. We frame this as tracking the state of a probabilistic finite-state automaton with deterministic state reveals and show that linear RNNs can be worse than non-linear RNNs at tracking states in this setup. Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL) Cite as: arXiv:2602.14814 [cs...