Llms Nlp

Codebase-scale retrieval using AST-derived graphs + BM25 — reducing LLM context from 100K to 5K tokens [D]

Reddit - Machine Learning April 30, 2026 1 min read

About this article

Wanted to share an approach I've been using for retrieval-augmented generation over large codebases and get feedback from people thinking about similar problems. The problem Naive codebase RAG typically works by chunking files into text segments and embedding them for similarity search. This breaks down on code because semantic similarity at the chunk level doesn't capture structural relationships — a function in file A calling a type defined in file C won't surface that dependency through em...

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Originally published on April 30, 2026. Curated by AI News.

Read Original Article

Llms

A Hackable ML Compiler Stack in 5,000 Lines of Python [P]

Hey r/MachineLearning, The modern ML (LLM) compiler stack is brutal. TVM is 500K+ lines of C++. PyTorch piles Dynamo, Inductor, and Trito...

Reddit - Machine Learning · 1 min · 36 minutes ago

Llms

Make your paper part of your codebase: Integrating Claude Code/Github Copilot with Overleaf for writing papers [P]

Since a lot of the members here are researchers, I thought I'll share my setup that has significantly acclerated my writing process. Much...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

OpenAI announces new advanced security for ChatGPT accounts, including a partnership with Yubico | TechCrunch

OpenAI is launching additional opt-in protections for ChatGPT accounts. The new security initiative includes a new partnership with secur...

TechCrunch - AI · 4 min · about 2 hours ago

Llms

After dissing Anthropic for limiting Mythos, OpenAI restricts access to Cyber, too | TechCrunch

OpenAI will begin rolling out it cybersecurity testing tool, GPT-5.5 Cyber only "to critical cyber defenders" at first.

Codebase-scale retrieval using AST-derived graphs + BM25 — reducing LLM context from 100K to 5K tokens [D]

About this article

Related Articles

A Hackable ML Compiler Stack in 5,000 Lines of Python [P]

Make your paper part of your codebase: Integrating Claude Code/Github Copilot with Overleaf for writing papers [P]

OpenAI announces new advanced security for ChatGPT accounts, including a partnership with Yubico | TechCrunch

After dissing Anthropic for limiting Mythos, OpenAI restricts access to Cyber, too | TechCrunch

No comments

Stay updated with AI News