HydraLM: 22× faster decoding and 16× smaller state memory in long-context inference experiments [P]
About this article
I’ve been experimenting with HydraLM, a long-context model for inference, and the numbers are getting a bit wild: the repo’s benchmark suite shows 1.00 retrieval accuracy even when the target fact is buried at 90% depth in a 1M-token test, p@1 = 0.987 and p@8 = 0.999 on a 1M-key fact bank, speculative decoding up to 1.8× faster, and reproducible results that also report about 99.8% FLOP savings and full memory savings at long context. The benchmark docs, reproduction scripts, and verification...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket