[2509.21013] Predicting LLM Reasoning Performance with Small Proxy Model
Summary
This article presents rBridge, a small proxy model that predicts reasoning performance in large language models (LLMs), demonstrating significant cost and efficiency benefits in dataset optimization.
Why It Matters
As the demand for large language models grows, optimizing their training processes becomes crucial. This study highlights a method to leverage smaller models for effective reasoning predictions, potentially reducing costs and improving accessibility in AI research and applications.
Key Takeaways
- rBridge effectively predicts reasoning performance in LLMs using small proxy models.
- The model reduces dataset ranking costs by over 100x compared to existing methods.
- It shows strong correlation across multiple reasoning benchmarks, enhancing predictive accuracy.
- Zero-shot transfer capabilities allow rBridge to apply insights across different pre-training datasets.
- This approach offers a practical solution for cost-effective reasoning-oriented pre-training.
Computer Science > Machine Learning arXiv:2509.21013 (cs) [Submitted on 25 Sep 2025 (v1), last revised 26 Feb 2026 (this version, v3)] Title:Predicting LLM Reasoning Performance with Small Proxy Model Authors:Woosung Koh, Juyoung Suk, Sungjun Han, Se-Young Yun, Jamin Shin View a PDF of the paper titled Predicting LLM Reasoning Performance with Small Proxy Model, by Woosung Koh and 4 other authors View PDF HTML (experimental) Abstract:Given the prohibitive cost of pre-training large language models, it is essential to leverage smaller proxy models to optimize datasets before scaling up. However, this approach becomes challenging for reasoning capabilities, which exhibit emergent behavior that only appear reliably at larger model sizes, often exceeding 7B parameters. To address this, we introduce rBridge, showing that small proxies ($\leq$1B) can effectively predict large-model reasoning by aligning more closely with (1) the pre-training objective and (2) the target task. rBridge achieves this by weighting negative log-likelihood with task alignment, using reasoning traces from frontier models as gold labels. In our experiments, rBridge (i) reduces dataset ranking costs by over 100x relative to the best baseline, (ii) achieves the strongest correlation across six reasoning benchmarks at 1B to 32B scale, and (iii) zero-shot transfers predictive relationships across pre-training datasets at 1B to 7B scale. These findings indicate that rBridge offers a practical path for explor...