[2602.22609] EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning
Summary
EvolveGen introduces a novel framework for generating hardware model checking benchmarks using reinforcement learning, addressing the benchmark gap in the field.
Why It Matters
The development of high-quality benchmarks is crucial for evaluating hardware model checking techniques. EvolveGen's approach enhances the diversity and difficulty of benchmarks, allowing for better assessment of solver performance and encouraging innovation in verification methods.
Key Takeaways
- EvolveGen combines reinforcement learning with high-level synthesis to create diverse benchmarks.
- The framework addresses the existing benchmark gap by generating challenging instances.
- Solver runtime serves as a reward signal, enabling the discovery of solver-specific weaknesses.
- EvolveGen produces benchmarks in standard formats like AIGER and BTOR2.
- The approach promotes rigorous evaluation of new verification techniques.
Computer Science > Hardware Architecture arXiv:2602.22609 (cs) [Submitted on 26 Feb 2026] Title:EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning Authors:Guangyu Hu, Xiaofeng Zhou, Wei Zhang, Hongce Zhang View a PDF of the paper titled EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning, by Guangyu Hu and 3 other authors View PDF HTML (experimental) Abstract:Progress in hardware model checking depends critically on high-quality benchmarks. However, the community faces a significant benchmark gap: existing suites are limited in number, often distributed only in representations such as BTOR2 without access to the originating register-transfer-level (RTL) designs, and biased toward extreme difficulty where instances are either trivial or intractable. These limitations hinder rigorous evaluation of new verification techniques and encourage overfitting of solver heuristics to a narrow set of problems. To address this, we introduce EvolveGen, a framework for generating hardware model checking benchmarks by combining reinforcement learning (RL) with high-level synthesis (HLS). Our approach operates at an algorithmic level of abstraction in which an RL agent learns to construct computation graphs. By compiling these graphs under different synthesis directives, we produce pairs of functionally equivalent but structurally distinct hardware designs, inducing challenging model c...