[2604.04898] QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
About this article
Abstract page for arXiv paper 2604.04898: QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
Computer Science > Artificial Intelligence arXiv:2604.04898 (cs) [Submitted on 6 Apr 2026] Title:QED-Nano: Teaching a Tiny Model to Prove Hard Theorems Authors:LM-Provers, Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching, Jia Li, Ian Wu, Lewis Tunstall, Aviral Kumar View a PDF of the paper titled QED-Nano: Teaching a Tiny Model to Prove Hard Theorems, by LM-Provers and 8 other authors View PDF HTML (experimental) Abstract:Proprietary AI systems have recently demonstrated impressive capabilities on complex proof-based problems, with gold-level performance reported at the 2025 International Mathematical Olympiad (IMO). However, the training pipelines behind these systems remain largely undisclosed, and their reliance on large "internal" models and scaffolds makes them expensive to run, difficult to reproduce, and hard to study or improve upon. This raises a central question: can small, open models also be trained to achieve competitive reasoning performance on difficult Olympiad-level math? In this paper, we answer this question by building QED-Nano, a 4B model post-trained for Olympiad-level proofs. Our training recipe has three stages: (1) supervised fine-tuning to imbue good proof-writing styles by distilling from DeepSeek-Math-V2, (2) reinforcement learning (RL) with rubric-based rewards, and (3) expanding RL with a reasoning cache, which decomposes long proofs into iterative summarize-and-refine cycles and enables stronger test-time reasoning. QED-Nano surpas...