[2602.13218] Scaling the Scaling Logic: Agentic Meta-Synthesis of Logic Reasoning
Summary
The paper presents SSLogic, a novel framework for scaling logical reasoning in reinforcement learning, enhancing the synthesis of verifiable training signals through a closed-loop generation and validation process.
Why It Matters
As reinforcement learning continues to evolve, the ability to generate and validate complex training data is crucial for improving AI models. SSLogic addresses existing limitations in current synthesis methods, offering a scalable solution that could enhance the reliability and effectiveness of AI training processes.
Key Takeaways
- SSLogic enables scalable synthesis of logical reasoning tasks at the task-family level.
- The framework incorporates a Multi-Gate Validation Protocol for enhanced reliability.
- Training on data evolved through SSLogic shows significant performance improvements over baseline models.
Computer Science > Artificial Intelligence arXiv:2602.13218 (cs) [Submitted on 23 Jan 2026] Title:Scaling the Scaling Logic: Agentic Meta-Synthesis of Logic Reasoning Authors:Bowen Liu, Zhi Wu, Runquan Xie, Zhanhui Kang, Jia Li View a PDF of the paper titled Scaling the Scaling Logic: Agentic Meta-Synthesis of Logic Reasoning, by Bowen Liu and 4 other authors View PDF HTML (experimental) Abstract:Scaling verifiable training signals remains a key bottleneck for Reinforcement Learning from Verifiable Rewards (RLVR). Logical reasoning is a natural substrate: constraints are formal and answers are programmatically checkable. However, prior synthesis pipelines either depend on expert-written code or operate within fixed templates/skeletons, which limits growth largely to instance-level perturbations. We propose SSLogic, an agentic meta-synthesis framework that scales at the task-family level by iteratively synthesizing and repairing executable Generator--Validator program pairs in a closed Generate--Validate--Repair loop, enabling continuous family evolution with controllable difficulty. To ensure reliability, we introduce a Multi-Gate Validation Protocol that combines multi-strategy consistency checks with Adversarial Blind Review, where independent agents must solve instances by writing and executing code to filter ambiguous or ill-posed tasks. Starting from 400 seed families, two evolution rounds expand to 953 families and 21,389 verifiable instances (from 5,718). Training o...