[2602.13209] LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets
Summary
The paper presents LemonadeBench, a benchmark for assessing the economic intuition of large language models (LLMs) through a simulated lemonade stand, revealing performance variations based on model sophistication.
Why It Matters
Understanding how LLMs perform in economic scenarios provides insights into their decision-making capabilities and limitations. This research could inform future developments in AI applications for business and finance, highlighting areas for improvement in model training.
Key Takeaways
- LemonadeBench evaluates LLMs' economic decision-making in a simulated market.
- Performance varies significantly with model sophistication, with advanced models achieving up to 70% of optimal profitability.
- Models demonstrate local optimization, excelling in specific areas while missing broader strategies.
Quantitative Finance > General Finance arXiv:2602.13209 (q-fin) [Submitted on 14 Jan 2026] Title:LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets Authors:Aidan Vyas View a PDF of the paper titled LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets, by Aidan Vyas View PDF HTML (experimental) Abstract:We introduce LemonadeBench v0.5, a minimal benchmark for evaluating economic intuition, long-term planning, and decision-making under uncertainty in large language models (LLMs) through a simulated lemonade stand business. Models must manage inventory with expiring goods, set prices, choose operating hours, and maximize profit over a 30-day period-tasks that any small business owner faces daily. All models demonstrate meaningful economic agency by achieving profitability, with performance scaling dramatically by sophistication-from basic models earning minimal profits to frontier models capturing 70% of theoretical optimal, a greater than 10x improvement. Yet our decomposition of business efficiency across six dimensions reveals a consistent pattern: models achieve local rather than global optimization, excelling in select areas while exhibiting surprising blind spots elsewhere. Subjects: General Finance (q-fin.GN); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.13209 [q-fin.GN] (or arXiv:2602.13209v1 [q-fin.GN] for this version) https://doi.org/10.48550/arXiv.2602.13209 Focus to learn m...