Codebase-scale retrieval using AST-derived graphs + BM25 — reducing LLM context from 100K to 5K tokens [D]

Reddit - Machine Learning 1 min read

About this article

Wanted to share an approach I've been using for retrieval-augmented generation over large codebases and get feedback from people thinking about similar problems. The problem Naive codebase RAG typically works by chunking files into text segments and embedding them for similarity search. This breaks down on code because semantic similarity at the chunk level doesn't capture structural relationships — a function in file A calling a type defined in file C won't surface that dependency through em...

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Originally published on April 30, 2026. Curated by AI News.

Related Articles

Llms

A Hackable ML Compiler Stack in 5,000 Lines of Python [P]

Hey r/MachineLearning, The modern ML (LLM) compiler stack is brutal. TVM is 500K+ lines of C++. PyTorch piles Dynamo, Inductor, and Trito...

Reddit - Machine Learning · 1 min ·
Llms

Make your paper part of your codebase: Integrating Claude Code/Github Copilot with Overleaf for writing papers [P]

Since a lot of the members here are researchers, I thought I'll share my setup that has significantly acclerated my writing process. Much...

Reddit - Machine Learning · 1 min ·
OpenAI announces new advanced security for ChatGPT accounts, including a partnership with Yubico | TechCrunch
Llms

OpenAI announces new advanced security for ChatGPT accounts, including a partnership with Yubico | TechCrunch

OpenAI is launching additional opt-in protections for ChatGPT accounts. The new security initiative includes a new partnership with secur...

TechCrunch - AI · 4 min ·
After dissing Anthropic for limiting Mythos, OpenAI restricts access to Cyber, too | TechCrunch
Llms

After dissing Anthropic for limiting Mythos, OpenAI restricts access to Cyber, too | TechCrunch

OpenAI will begin rolling out it cybersecurity testing tool, GPT-5.5 Cyber only "to critical cyber defenders" at first.

TechCrunch - AI · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime