[2602.22538] RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format
Summary
The paper presents RAIN-Merging, a gradient-free method designed to enhance instruction adherence in large reasoning models while preserving their structured thinking format.
Why It Matters
As large reasoning models (LRMs) increasingly power AI applications, ensuring they follow instructions accurately is crucial for reliability. RAIN-Merging addresses this gap by integrating instruction-tuned models without compromising reasoning capabilities, thus improving model performance across various benchmarks.
Key Takeaways
- RAIN-Merging integrates instruction-tuned models into large reasoning models effectively.
- The method preserves the structured reasoning format while enhancing instruction adherence.
- Improvements are consistent across different model scales and architectures.
Computer Science > Machine Learning arXiv:2602.22538 (cs) [Submitted on 26 Feb 2026] Title:RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format Authors:Zhehao Huang, Yuhang Liu, Baijiong Lin, Yixin Lou, Zhengbao He, Hanling Tian, Tao Li, Xiaolin Huang View a PDF of the paper titled RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format, by Zhehao Huang and 7 other authors View PDF HTML (experimental) Abstract:Large reasoning models (LRMs) excel at a long chain of reasoning but often fail to faithfully follow instructions regarding output format, constraints, or specific requirements. We investigate whether this gap can be closed by integrating an instruction-tuned model (ITM) into an LRM. Analyzing their differences in parameter space, namely task vectors, we find that their principal subspaces are nearly orthogonal across key modules, suggesting a lightweight merging with minimal interference. However, we also demonstrate that naive merges are fragile because they overlook the output format mismatch between LRMs (with explicit thinking and response segments) and ITMs (answers-only). We introduce RAIN-Merging (Reasoning-Aware Instruction-attention guided Null-space projection Merging), a gradient-free method that integrates instruction following while preserving thinking format and reasoning performance. First, with a small reaso...