[2510.24803] MASPRM: Multi-Agent System Process Reward Model

[2510.24803] MASPRM: Multi-Agent System Process Reward Model

arXiv - AI 3 min read Article

Summary

The MASPRM paper introduces a novel Multi-Agent System Process Reward Model that enhances performance during inference by guiding search and optimizing computation in multi-agent systems.

Why It Matters

This research addresses the critical need for efficient computation in multi-agent systems, particularly in scenarios requiring high performance. By improving the decision-making process during inference, MASPRM can significantly enhance the effectiveness of AI applications in various domains, making it relevant for developers and researchers in artificial intelligence and multi-agent systems.

Key Takeaways

  • MASPRM assigns values to partial inter-agent transcripts, improving decision-making during inference.
  • The model is trained using Monte Carlo Tree Search rollouts without requiring human annotations.
  • Performance improvements include up to 13.4 points in Hit@1 over policy likelihood.
  • The approach focuses computation on promising branches while pruning unpromising ones.
  • Benchmarks include GSM8K, MATH, MMLU, and LogiQA, demonstrating versatility across tasks.

Computer Science > Multiagent Systems arXiv:2510.24803 (cs) [Submitted on 28 Oct 2025 (v1), last revised 12 Feb 2026 (this version, v2)] Title:MASPRM: Multi-Agent System Process Reward Model Authors:Milad Yazdani, Mahdi Mostajabdaveh, Zirui Zhou, Ying Xiong View a PDF of the paper titled MASPRM: Multi-Agent System Process Reward Model, by Milad Yazdani and 3 other authors View PDF HTML (experimental) Abstract:Practical deployment of multi-agent systems (MAS) demands strong performance at test time, motivating methods that guide search during inference and selectively spend compute to improve quality. We present the Multi-Agent System Process Reward Model (MASPRM). It assigns values to partial inter-agent transcripts for each action and each agent, and acts as a controller during inference. MASPRM is trained from multi-agent Monte Carlo Tree Search (MCTS) rollouts labeled only with terminal outcome rewards, without requiring human step-level annotations, by propagating returns to local targets. During inference, MASPRM guides step-level beam search (SBS) and MCTS, focusing computation on promising branches and pruning unpromising ones. We train and test MASPRM across different tasks and domains, using GSM8K, MATH, MMLU, and LogiQA as benchmarks. Averaged across these benchmarks, MASPRM improves Hit@1 over policy likelihood by up to $+13.4$ points and improves ranking quality, reducing Hit@1$->$Hit@5 gaps by up to $10.3$ points. MASPRM complements inference-time search by sc...

Related Articles

Machine Learning

What to expect from AlphaZero's value predictions [D]

An AlphaZero agent has learnt to predict the value of a game state by training on data generated by self-play by the model and a series o...

Reddit - Machine Learning · 1 min ·
Machine Learning

Open Source Projects related to CNNs to Contribute To? [D]

Around a decade a go I was tinkering a lot with CNNs for real time event detection. I enjoyed that a lot and always wanted to get back in...

Reddit - Machine Learning · 1 min ·
I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI | WIRED
Machine Learning

I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI | WIRED

For screenwriters like me—and job seekers all over—AI gig work is the new waiting tables. In eight months, I’ve done 20 of these soul-cru...

Wired - AI · 27 min ·
Machine Learning

Are Enterprises Using AI in the Wrong Places?

Most enterprise AI discussions still revolve around one question: But I’m starting to think that may be the wrong question entirely. The ...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime