[2502.00225] Should You Use Your Large Language Model to Explore or Exploit?
Summary
This article evaluates the effectiveness of large language models (LLMs) in addressing exploration-exploitation tradeoffs in decision-making tasks, revealing their strengths and limitations.
Why It Matters
Understanding how LLMs can be utilized for exploration and exploitation is crucial for advancing AI decision-making capabilities. This research highlights the potential and challenges of LLMs in practical applications, informing future developments in machine learning and AI.
Key Takeaways
- LLMs show promise in exploration tasks but struggle with exploitation tasks.
- Reasoning models are effective for exploitation but often impractical due to cost and speed.
- Non-reasoning models can improve performance on medium-difficulty tasks through tool use and in-context summarization.
- All studied LLMs performed worse than simple linear regression in certain scenarios.
- LLMs can effectively explore large action spaces with inherent semantics.
Computer Science > Machine Learning arXiv:2502.00225 (cs) [Submitted on 31 Jan 2025 (v1), last revised 17 Feb 2026 (this version, v3)] Title:Should You Use Your Large Language Model to Explore or Exploit? Authors:Keegan Harris, Aleksandrs Slivkins View a PDF of the paper titled Should You Use Your Large Language Model to Explore or Exploit?, by Keegan Harris and Aleksandrs Slivkins View PDF HTML (experimental) Abstract:We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. While previous work has largely study the ability of LLMs to solve combined exploration-exploitation tasks, we take a more systematic approach and use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that reasoning models show the most promise for solving exploitation tasks, although they are still too expensive or too slow to be used in many practical settings. Motivated by this, we study tool use and in-context summarization using non-reasoning models. We find that these mitigations may be used to substantially improve performance on medium-difficulty tasks, however even then, all LLMs we study perform worse than a simple linear regression, even in non-linear settings. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore. Subjects: Machine Learning (cs.LG); Artificial Intel...