[2510.26792] Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability
Summary
This article explores how Transformer models can learn sequences generated by Permuted Congruential Generators (PCGs), demonstrating their effectiveness in predicting complex patterns in pseudorandom number generation.
Why It Matters
Understanding how Transformers can learn and predict sequences from complex PRNGs like PCGs is crucial for advancements in machine learning applications, cryptography, and AI interpretability. This research highlights the potential of Transformers in areas requiring high-level pattern recognition and curriculum learning.
Key Takeaways
- Transformers can effectively learn and predict sequences from complex Permuted Congruential Generators (PCGs).
- The study reveals a scaling law indicating that the number of sequence elements needed for accurate predictions grows with the modulus size.
- Curriculum learning is essential for optimizing the learning process when dealing with larger moduli in PRNGs.
- Novel clustering phenomena in embedding layers suggest that representations can transfer across different scales of moduli.
- The findings have implications for improving AI interpretability and enhancing cryptographic applications.
Computer Science > Machine Learning arXiv:2510.26792 (cs) [Submitted on 30 Oct 2025 (v1), last revised 16 Feb 2026 (this version, v2)] Title:Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability Authors:Tao Tao, Maissam Barkeshli View a PDF of the paper titled Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability, by Tao Tao and Maissam Barkeshli View PDF HTML (experimental) Abstract:We study the ability of Transformer models to learn sequences generated by Permuted Congruential Generators (PCGs), a widely used family of pseudo-random number generators (PRNGs). PCGs introduce substantial additional difficulty over linear congruential generators (LCGs) by applying a series of bit-wise shifts, XORs, rotations and truncations to the hidden state. We show that Transformers can nevertheless successfully perform in-context prediction on unseen sequences from diverse PCG variants, in tasks that are beyond published classical attacks. In our experiments we scale moduli up to $2^{22}$ using up to $50$ million model parameters and datasets with up to $5$ billion tokens. Surprisingly, we find even when the output is truncated to a single bit, it can be reliably predicted by the model. When multiple distinct PRNGs are presented together during training, the model can jointly learn them, identifying structures from different permutations. We demonstrate a scaling la...