[2510.05064] Boomerang Distillation Enables Zero-Shot Model Size Interpolation

[2510.05064] Boomerang Distillation Enables Zero-Shot Model Size Interpolation

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2510.05064: Boomerang Distillation Enables Zero-Shot Model Size Interpolation

Computer Science > Machine Learning arXiv:2510.05064 (cs) [Submitted on 6 Oct 2025 (v1), last revised 27 Feb 2026 (this version, v2)] Title:Boomerang Distillation Enables Zero-Shot Model Size Interpolation Authors:Sara Kangaslahti, Nihal V. Nayak, Jonathan Geuter, Marco Fumero, Francesco Locatello, David Alvarez-Melis View a PDF of the paper titled Boomerang Distillation Enables Zero-Shot Model Size Interpolation, by Sara Kangaslahti and 5 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are typically deployed under diverse memory and compute constraints. Existing approaches build model families by training each size independently, which is prohibitively expensive and provides only coarse-grained size options. In this work, we identify a novel phenomenon that we call boomerang distillation: starting from a large base model (the teacher), one first distills down to a small student and then progressively reconstructs intermediate-sized models by re-incorporating blocks of teacher layers into the student without any additional training. This process produces zero-shot interpolated models of many intermediate sizes whose performance scales smoothly between the student and teacher, often matching or surpassing pretrained or distilled models of the same size. We further analyze when this type of interpolation succeeds, showing that alignment between teacher and student through pruning and distillation is essential. Boomerang distillation thus prov...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

Llms

Curated 550+ free AI tools useful for building projects (LLMs, APIs, local models, RAG, agents)

Over the last few days I was collecting free or low cost AI tools that are actually useful if you want to build stuff, not just try rando...

Reddit - Artificial Intelligence · 1 min ·
Claude Mythos and misguided open-weight fearmongering
Llms

Claude Mythos and misguided open-weight fearmongering

AI Tools & Products · 9 min ·
Llms

Anthropic Agrees to Rent CoreWeave AI Capacity to Power Claude

AI Tools & Products · 1 min ·
CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%
Llms

CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%

AI Tools & Products · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime