Jupyter Agents: training LLMs to reason with notebooks
About this article
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Back to Articles Jupyter Agents: training LLMs to reason with notebooks Published September 10, 2025 Update on GitHub Upvote 62 +56 Baptiste Colle baptistecolle Follow Hanna Yukhymenko hannayukhymenko Follow Leandro von Werra lvwerra Follow The past year has been all about giving LLMs more tools and autonomy to solve more complex and open ended tasks. The goal of the Jupyter Agent is to give the model the ultimate tool: code execution. A natural way to display multi-step code execution together with reasoning is within a Jupyter Notebook, which consists of code and markdown cells. So we built Jupyter Agent to act as an agent that can execute code directly inside a Jupyter notebook and use this environment to solve data analysis and data science tasks. Think of it like Cursor, but living natively inside your data science workflow.We built a demo of this vision with Qwen-3 Coder, currently one of the strongest coding models. This is a follow-up to our earlier work on jupyter-agent (v1). While large models are starting to show useful behavior, the key question is how we can continue improving them. To this end, we focus on strengthening smaller models to perform well on agentic data science tasks as they currently struggle to compete with the large models. The goal of this project is to build a pipeline to first generate high-quality training data, then fine-tune an existing small model, and finally evaluate whether the model's performance improves on relevant benchmarks. Let...