[2602.13217] VeRA: Verified Reasoning Data Augmentation at Scale
Summary
VeRA introduces a framework for generating verified reasoning data at scale, enhancing AI evaluation by creating dynamic, executable benchmarks that reduce memorization and improve assessment accuracy.
Why It Matters
The static nature of current AI evaluation methods limits their effectiveness, leading to memorization rather than genuine reasoning. VeRA addresses this by providing a scalable solution that generates diverse and verified problem sets, promoting more accurate assessments of AI capabilities. This innovation is crucial for advancing AI research and ensuring robust evaluation standards.
Key Takeaways
- VeRA transforms benchmarks into executable specifications for dynamic evaluation.
- It offers two modes: VeRA-E for equivalent problem rewriting and VeRA-H for generating harder tasks.
- The framework enhances evaluation quality by revealing memorization patterns.
- VeRA allows for human-free generation of complex tasks with reliable labels.
- Open-sourcing the code and datasets promotes further research and development.
Computer Science > Artificial Intelligence arXiv:2602.13217 (cs) [Submitted on 23 Jan 2026] Title:VeRA: Verified Reasoning Data Augmentation at Scale Authors:Zerui Cheng, Jiashuo Liu, Chunjie Wu, Jianzhu Yao, Pramod Viswanath, Ge Zhang, Wenhao Huang View a PDF of the paper titled VeRA: Verified Reasoning Data Augmentation at Scale, by Zerui Cheng and 6 other authors View PDF HTML (experimental) Abstract:The main issue with most evaluation schemes today is their "static" nature: the same problems are reused repeatedly, allowing for memorization, format exploitation, and eventual saturation. To measure genuine AI progress, we need evaluation that is robust by construction, not by post-hoc detection. In response, we propose VeRA (Verified Reasoning Data Augmentation), a framework that converts benchmark problems into executable specifications, comprising (i) a natural language template with placeholder slots, (ii) a coherent generator that samples valid configurations, and (iii) a deterministic verifier that validates parameters and calculates the corresponding correct answers for each configuration. From a single seed problem, VeRA automatically creates unlimited verified variants with reliable labels at near-zero marginal cost without human involvement. VeRA operates in two complementary modes. VeRA-E (equivalent) rewrites problems while keeping the underlying logic intact, useful for detecting memorization versus genuine reasoning. VeRA-H (hardened) systematically increase...