[2602.19519] Ada-RS: Adaptive Rejection Sampling for Selective Thinking

[2602.19519] Ada-RS: Adaptive Rejection Sampling for Selective Thinking

arXiv - Machine Learning 3 min read Article

Summary

The paper introduces Ada-RS, an adaptive rejection sampling framework aimed at enhancing selective thinking in large language models (LLMs), improving efficiency and accuracy in cost-sensitive applications.

Why It Matters

As LLMs are increasingly used in environments where cost and latency are critical, Ada-RS offers a novel approach to optimize reasoning processes. By filtering outputs effectively, it can significantly reduce resource consumption while maintaining performance, making it relevant for developers and researchers in AI and machine learning.

Key Takeaways

  • Ada-RS improves reasoning efficiency in LLMs by reducing token usage by up to 80%.
  • The framework uses adaptive length-penalized rewards to filter high-quality outputs.
  • It integrates seamlessly with existing optimization strategies like DPO and DAPO.
  • Ada-RS can significantly lower the thinking rate while maintaining or enhancing accuracy.
  • This approach highlights the importance of training-signal selection for efficient reasoning.

Computer Science > Artificial Intelligence arXiv:2602.19519 (cs) [Submitted on 23 Feb 2026] Title:Ada-RS: Adaptive Rejection Sampling for Selective Thinking Authors:Yirou Ge, Yixi Li, Alec Chiu, Shivani Shekhar, Zijie Pan, Avinash Thangali, Yun-Shiuan Chuang, Chaitanya Kulkarni, Uma Kona, Linsey Pang, Prakhar Mehrotra View a PDF of the paper titled Ada-RS: Adaptive Rejection Sampling for Selective Thinking, by Yirou Ge and 10 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are increasingly being deployed in cost and latency-sensitive settings. While chain-of-thought improves reasoning, it can waste tokens on simple requests. We study selective thinking for tool-using LLMs and introduce Adaptive Rejection Sampling (Ada-RS), an algorithm-agnostic sample filtering framework for learning selective and efficient reasoning. For each given context, Ada-RS scores multiple sampled completions with an adaptive length-penalized reward then applies stochastic rejection sampling to retain only high-reward candidates (or preference pairs) for downstream optimization. We demonstrate how Ada-RS plugs into both preference pair (e.g. DPO) or grouped policy optimization strategies (e.g. DAPO). Using Qwen3-8B with LoRA on a synthetic tool call-oriented e-commerce benchmark, Ada-RS improves the accuracy-efficiency frontier over standard algorithms by reducing average output tokens by up to 80% and reducing thinking rate by up to 95% while maintaining or improvi...

Related Articles

Llms

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

Anthropic's AuditBench - 56 Llama 3.3 70B models with planted hidden behaviors - their best agent detects the behaviros 10-13% of the tim...

Reddit - Machine Learning · 1 min ·
Llms

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an after...

Reddit - Machine Learning · 1 min ·
Llms

I have been coding for 11 years and I caught myself completely unable to debug a problem without AI assistance last month. That scared me more than anything I have seen in this industry.

I want to be honest about something that happened to me because I think it is more common than people admit. Last month I hit a bug in a ...

Reddit - Artificial Intelligence · 1 min ·
Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime