[2603.27277] Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP
About this article
Abstract page for arXiv paper 2603.27277: Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP
Computer Science > Software Engineering arXiv:2603.27277 (cs) [Submitted on 28 Mar 2026] Title:Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP Authors:Martin Vogel, Falk Meyer-Eschenbach, Severin Kohler, Elias Grünewald, Felix Balzer View a PDF of the paper titled Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP, by Martin Vogel and 4 other authors View PDF HTML (experimental) Abstract:Large Language Model (LLM) coding agents typically explore codebases through repeated file-reading and grep-searching, consuming thousands of tokens per query without structural understanding. We present Codebase-Memory, an open-source system that constructs a persistent, Tree-Sitter-based knowledge graph via the Model Context Protocol (MCP), parsing 66 languages through a multi-phase pipeline with parallel worker pools, call-graph traversal, impact analysis, and community discovery. Evaluated across 31 real-world repositories, Codebase-Memory achieves 83% answer quality versus 92% for a file-exploration agent, at ten times fewer tokens and 2.1 times fewer tool calls. For graph-native queries such as hub detection and caller ranking, it matches or exceeds the explorer on 19 of 31 languages. Comments: Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Programming Languages (cs.PL) ACM classes: D.2.3; D.2.7; D.3.4; H.3.3; I.2.2 Cite as: arXiv:2603.27277 [cs.SE] (or arXiv:2603.27277v1 [cs.SE] for th...