[2603.26556] When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models
About this article
Abstract page for arXiv paper 2603.26556: When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models
Computer Science > Computation and Language arXiv:2603.26556 (cs) [Submitted on 27 Mar 2026] Title:When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models Authors:Juan Gabriel Kostelec, Xiang Wang, Axel Laborieux, Christos Sourmpis, Qinghai Guo View a PDF of the paper titled When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models, by Juan Gabriel Kostelec and 4 other authors View PDF HTML (experimental) Abstract:Converting a pretrained Transformer into a more efficient hybrid model through distillation offers a promising approach to reducing inference costs. However, achieving high-quality generation in distilled models requires careful joint design of both the student architecture and the distillation process. Many prior distillation works evaluate downstream multiple-choice benchmarks by ranking candidate answers with log-likelihood rather than requiring autoregressive generation, which can obscure important differences in model quality. For example, we show that a 7B parameter distilled model that nearly matches its teacher to within 0.2\,pp under log-likelihood scoring actually falls behind by 20.8\,pp when the model must generate answers autoregressively. We propose a Hybrid Kimi Delta Attention (Hybrid-KDA) architecture paired with GenDistill, a multi-stage distillation pipeline, and use generation-based evaluation throughout to guide design decisions. Applying this approach to Qwen3-0.6B, we systematically ablate six d...