DeepMath: A lightweight math reasoning Agent with smolagents
About this article
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Back to Articles DeepMath: A lightweight math reasoning Agent with smolagents Published December 4, 2025 Update on GitHub Upvote 39 +33 Daniel Fleischer danf Follow Intel Moshe Berchansky mber Follow Intel Moshe Wasserblat moshew Follow Intel By Intel AI Software Group DeepMath is an aligned math reasoning agent built on Qwen3-4B Thinking and fine-tuned with GRPO (Group Relative Policy Optimization). Instead of verbose text, the model emits tiny Python snippets for intermediate steps, runs them in a secure sandbox, and folds the results back into its reasoning, reducing errors and output length. The agent is implemented using the smolagents library. We evaluate DeepMath on four math datasets: MATH500, AIME, HMMT, and HLE, and show that: 🤖 The math agent alone reduces output lengths by up to 66%, while often improving accuracy. ⚡ GRPO training improves the agent performance even further, in almost all benchmarks. 👉 Code and evaluation scripts: https://github.com/IntelLabs/DeepMath 👉 Model: https://huggingface.co/Intel/deepmath-v1 Why DeepMath? Large language models (LLMs) have advanced reasoning capabilities, but mathematical problem-solving remains challenging; chain-of-thought traces can be lengthy and prone to arithmetic mistakes. Recent works[^1][^2] demonstrate that small models can reach strong performance, and other studies[^3] investigate tool use to improve reliability. What those papers generally do not emphasize is reducing trace verbosity or explicitly training ...