[2506.08672] RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
Summary
RuleReasoner introduces a novel approach to rule-based reasoning using domain-aware dynamic sampling, enhancing reinforcement learning for improved performance on various tasks.
Why It Matters
This research addresses significant challenges in rule-based reasoning by leveraging reinforcement learning, which can lead to more efficient and effective AI applications. The findings demonstrate substantial improvements over existing models, making it relevant for researchers and practitioners in AI and machine learning.
Key Takeaways
- RuleReasoner enhances rule-based reasoning through dynamic sampling.
- The method shows improved performance on both in-distribution and out-of-distribution tasks.
- Higher computational efficiency compared to previous methods is achieved.
- Domain-aware dynamic sampling updates training batches based on historical rewards.
- This approach mitigates challenges associated with variations in rule formats and complexities.
Computer Science > Computation and Language arXiv:2506.08672 (cs) [Submitted on 10 Jun 2025 (v1), last revised 15 Feb 2026 (this version, v2)] Title:RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling Authors:Yang Liu, Jiaqi Li, Zilong Zheng View a PDF of the paper titled RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling, by Yang Liu and 2 other authors View PDF HTML (experimental) Abstract:Rule-based reasoning is acknowledged as one of the fundamental problems of reasoning. While recent studies show that large reasoning models (LRMs) have remarkable reasoning capabilities enhanced by reinforcement learning (RL), real applications still face severe challenges due to variations in rule formats, types, and complexity. To mitigate this issue, we introduce RuleReasoner, an effective method for rule-based reasoning via a wide collection of curated tasks and a novel domain-aware dynamic sampling approach in RL. Specifically, RuleReasoner resamples each training batch by updating the domain weights based on historical rewards. This facilitates domain balance and active learning schedules for RL, obviating static mix-training engineered by human. Evaluations on in-distribution (ID) and out-of-distribution (OOD) benchmarks reveal that RuleReasoner outperforms frontier LRMs by a significant margin ($\Delta$4.1% on eight ID tasks and $\Delta$10.4% on three OOD tasks over OpenAI-o1). Notably, our approach also exhibits higher c...