[2602.23271] Evaluating Stochasticity in Deep Research Agents
Summary
This paper evaluates the stochasticity in Deep Research Agents (DRAs), highlighting how variability in their outputs can impact research quality and proposing methods to mitigate this issue.
Why It Matters
Understanding and addressing stochasticity in DRAs is crucial for enhancing their reliability in real-world applications such as finance and healthcare. This research provides a framework for evaluating and reducing variability, which can lead to more consistent and accurate research outcomes.
Key Takeaways
- Stochasticity in DRAs can lead to significant variability in research outputs.
- The study identifies three sources of stochasticity: information acquisition, compression, and inference.
- Mitigating stochasticity can improve research quality without sacrificing output accuracy.
- Controlled experiments demonstrate a 22% reduction in average stochasticity with proposed methods.
- The findings are relevant for deploying DRAs in critical decision-making domains.
Computer Science > Artificial Intelligence arXiv:2602.23271 (cs) [Submitted on 26 Feb 2026] Title:Evaluating Stochasticity in Deep Research Agents Authors:Haotian Zhai, Elias Stengel-Eskin, Pratik Patil, Liu Leqi View a PDF of the paper titled Evaluating Stochasticity in Deep Research Agents, by Haotian Zhai and 3 other authors View PDF HTML (experimental) Abstract:Deep Research Agents (DRAs) are promising agentic systems that gather and synthesize information to support research across domains such as financial decision-making, medical analysis, and scientific discovery. Despite recent improvements in research quality (e.g., outcome accuracy when ground truth is available), DRA system design often overlooks a critical barrier to real-world deployment: stochasticity. Under identical queries, repeated executions of DRAs can exhibit substantial variability in terms of research outcome, findings, and citations. In this paper, we formalize the study of stochasticity in DRAs by modeling them as information acquisition Markov Decision Processes. We introduce an evaluation framework that quantifies variance in the system and identify three sources of it: information acquisition, information compression, and inference. Through controlled experiments, we investigate how stochasticity from these modules across different decision steps influences the variance of DRA outputs. Our results show that reducing stochasticity can improve research output quality, with inference and early-sta...