[2602.22871] Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching

[2602.22871] Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching

arXiv - AI 4 min read Article

Summary

The paper presents a novel framework called Stitching Noisy Diffusion Thoughts, which enhances reasoning in large language models by combining low-cost diffusion-sampled trajectories into a coherent rationale, improving accuracy and reducing latency in problem-solving tasks.

Why It Matters

This research addresses the limitations of existing aggregation strategies in large language models, particularly in reasoning tasks. By improving the way intermediate steps are utilized, it enhances the performance of AI systems in complex problem-solving, making it relevant for advancements in AI applications across various fields.

Key Takeaways

  • Introduces a self-consistency framework for reasoning in large language models.
  • Improves accuracy by up to 23.8% across math and coding tasks.
  • Reduces latency by up to 1.8x compared to traditional models.
  • Utilizes a modular approach separating exploration from evaluation.
  • Demonstrates effectiveness particularly on harder reasoning problems.

Computer Science > Computation and Language arXiv:2602.22871 (cs) [Submitted on 26 Feb 2026] Title:Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching Authors:Roy Miles, Aysim Toker, Andreea-Maria Oncescu, Songcen Xu, Jiankang Deng, Ismail Elezi View a PDF of the paper titled Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching, by Roy Miles and 5 other authors View PDF HTML (experimental) Abstract:Reasoning with large language models often benefits from generating multiple chains-of-thought, but existing aggregation strategies are typically trajectory-level (e.g., selecting the best trace or voting on the final answer), discarding useful intermediate work from partial or "nearly correct" attempts. We propose Stitching Noisy Diffusion Thoughts, a self-consistency framework that turns cheap diffusion-sampled reasoning into a reusable pool of step-level candidates. Given a problem, we (i) sample many diverse, low-cost reasoning trajectories using a masked diffusion language model, (ii) score every intermediate step with an off-the-shelf process reward model (PRM), and (iii) stitch these highest-quality steps across trajectories into a composite rationale. This rationale then conditions an autoregressive (AR) model (solver) to recompute only the final answer. This modular pipeline separates exploration (diffusion) from evaluation and solution synthesis, avoiding monolithic unified hybrids while preserving broad search. Across m...

Related Articles

Llms

This Is Not Hacking. This Is Structured Intelligence.

Watch me demonstrate everything I've been talking about—live, in real time. The Setup: Maestro University AI enrollment system Standard c...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...

Reddit - Machine Learning · 1 min ·
Llms

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

Hi Everybody! I just wanted to share an update on a project I’ve been working on called BULaMU, a family of language models trained (20M,...

Reddit - Machine Learning · 1 min ·
Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users
Llms

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

A study found that sycophancy is pervasive among chatbots, and that bots are more likely than human peers to affirm a person's bad behavior.

AI Tools & Products · 6 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime