[2506.08672] RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

[2506.08672] RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

arXiv - Machine Learning 3 min read Article

Summary

RuleReasoner introduces a novel approach to rule-based reasoning using domain-aware dynamic sampling, enhancing reinforcement learning for improved performance on various tasks.

Why It Matters

This research addresses significant challenges in rule-based reasoning by leveraging reinforcement learning, which can lead to more efficient and effective AI applications. The findings demonstrate substantial improvements over existing models, making it relevant for researchers and practitioners in AI and machine learning.

Key Takeaways

  • RuleReasoner enhances rule-based reasoning through dynamic sampling.
  • The method shows improved performance on both in-distribution and out-of-distribution tasks.
  • Higher computational efficiency compared to previous methods is achieved.
  • Domain-aware dynamic sampling updates training batches based on historical rewards.
  • This approach mitigates challenges associated with variations in rule formats and complexities.

Computer Science > Computation and Language arXiv:2506.08672 (cs) [Submitted on 10 Jun 2025 (v1), last revised 15 Feb 2026 (this version, v2)] Title:RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling Authors:Yang Liu, Jiaqi Li, Zilong Zheng View a PDF of the paper titled RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling, by Yang Liu and 2 other authors View PDF HTML (experimental) Abstract:Rule-based reasoning is acknowledged as one of the fundamental problems of reasoning. While recent studies show that large reasoning models (LRMs) have remarkable reasoning capabilities enhanced by reinforcement learning (RL), real applications still face severe challenges due to variations in rule formats, types, and complexity. To mitigate this issue, we introduce RuleReasoner, an effective method for rule-based reasoning via a wide collection of curated tasks and a novel domain-aware dynamic sampling approach in RL. Specifically, RuleReasoner resamples each training batch by updating the domain weights based on historical rewards. This facilitates domain balance and active learning schedules for RL, obviating static mix-training engineered by human. Evaluations on in-distribution (ID) and out-of-distribution (OOD) benchmarks reveal that RuleReasoner outperforms frontier LRMs by a significant margin ($\Delta$4.1% on eight ID tasks and $\Delta$10.4% on three OOD tasks over OpenAI-o1). Notably, our approach also exhibits higher c...

Related Articles

Machine Learning

Anthropic's Latest AI Model Sends a Shockwave Through Software Stocks

AI Tools & Products · 1 min ·
The Gemini app can now generate interactive simulations and models.
Llms

The Gemini app can now generate interactive simulations and models.

AI Tools & Products · 1 min ·
The fear over Anthropic’s new AI model Mythos
Machine Learning

The fear over Anthropic’s new AI model Mythos

Anthropic is not releasing the model to the public over safety concerns and potential hacking possibilities

AI Tools & Products · 5 min ·
Meta AI app climbs to No. 5 on the App Store after Muse Spark launch
Machine Learning

Meta AI app climbs to No. 5 on the App Store after Muse Spark launch

The app was ranking No. 57 on the App Store just before Meta AI's new model launched. Now it's No. 5 — and rising.

TechCrunch - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime