Llms

I built a tool that turns repeated file reads into 13-token references. My AI Coding sessions now use 86% fewer tokens on file-heavy tasks based on mathematics and research. [P]

Reddit - Machine Learning April 14, 2026 1 min read

About this article

I got tired of watching Claude Code re-read the same files over and over. A 2,000-token file read 5 times = 10,000 tokens gone. So I built sqz. The key insight: most token waste isn't from verbose content - it's from repetition. sqz keeps a SHA-256 content cache. First read compresses normally. Every subsequent read of the same file returns a 13-token inline reference instead of the full content. The LLM still understands it. Real numbers from my sessions: File read 5x: 10,000 tokens → 1,400 ...

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Originally published on April 14, 2026. Curated by AI News.

Read Original Article

Llms

LLM Guard scored 0/8 detecting a Crescendo multi-turn attack. Arc Sentry flagged it at Turn 3.

Crescendo (Russinovich et al., USENIX Security 2025) is a multi-turn jailbreak that starts with innocent questions and gradually steers a...

Reddit - Artificial Intelligence · 1 min · 20 minutes ago

Llms

Free LLM security audit

I built Arc Sentry, a pre-generation guardrail for open source LLMs that blocks prompt injection before the model generates a response. I...

Reddit - Artificial Intelligence · 1 min · 20 minutes ago

Llms

You can decompose models into a graph database [N]

https://github.com/chrishayuk/larql https://youtu.be/8Ppw8254nLI?si=lo-6PM5pwnpyvwMXh Now you can decompose a static llm model and do a k...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

How much are you actually spending on AI APIs? I built an OpenSource router to cut that.

I've been working on Manifest, an open-source AI cost optimization tool. The idea is simple: instead of sending every request to the same...

I built a tool that turns repeated file reads into 13-token references. My AI Coding sessions now use 86% fewer tokens on file-heavy tasks based on mathematics and research. [P]

About this article

Related Articles

LLM Guard scored 0/8 detecting a Crescendo multi-turn attack. Arc Sentry flagged it at Turn 3.

Free LLM security audit

You can decompose models into a graph database [N]

How much are you actually spending on AI APIs? I built an OpenSource router to cut that.

No comments

Stay updated with AI News