[2604.03472] Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution
About this article
Abstract page for arXiv paper 2604.03472: Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution
Computer Science > Computation and Language arXiv:2604.03472 (cs) [Submitted on 3 Apr 2026] Title:Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution Authors:Jacob Dineen, Aswin RRV, Zhikun Xu, Ben Zhou View a PDF of the paper titled Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution, by Jacob Dineen and 3 other authors View PDF Abstract:Co-evolutionary self-play, where one language model generates problems and another solves them, promises autonomous curriculum learning without human supervision. In practice, the proposer quickly converges to a narrow distribution of problems that satisfy the reward function. This diversity collapse renders the curriculum uninformative for the solver, stalling the co-evolutionary loop. We introduce vocabulary dropout, a random mask applied to the proposer's output logits during both policy training and curriculum generation, as a lightweight mechanism to sustain diversity. The mask is hard and non-stationary, preventing the proposer from locking into fixed token sequences. Training Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero, we find that vocabulary dropout sustains proposer diversity across lexical, semantic, and functional metrics throughout training, and yields solver improvements averaging +4.4 points at 8B, with the largest gains on competition-level benchmarks. Our findings suggest that explicit action-space constraints, analogous to the structural role that game rules play in classical self-p...