[2603.26556] When Perplexity Lies: Generation-Focused Distillation of

[2603.26556] When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models

arXiv - AI March 30, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.26556: When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models

Computer Science > Computation and Language arXiv:2603.26556 (cs) [Submitted on 27 Mar 2026] Title:When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models Authors:Juan Gabriel Kostelec, Xiang Wang, Axel Laborieux, Christos Sourmpis, Qinghai Guo View a PDF of the paper titled When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models, by Juan Gabriel Kostelec and 4 other authors View PDF HTML (experimental) Abstract:Converting a pretrained Transformer into a more efficient hybrid model through distillation offers a promising approach to reducing inference costs. However, achieving high-quality generation in distilled models requires careful joint design of both the student architecture and the distillation process. Many prior distillation works evaluate downstream multiple-choice benchmarks by ranking candidate answers with log-likelihood rather than requiring autoregressive generation, which can obscure important differences in model quality. For example, we show that a 7B parameter distilled model that nearly matches its teacher to within 0.2\,pp under log-likelihood scoring actually falls behind by 20.8\,pp when the model must generate answers autoregressively. We propose a Hybrid Kimi Delta Attention (Hybrid-KDA) architecture paired with GenDistill, a multi-stage distillation pipeline, and use generation-based evaluation throughout to guide design decisions. Applying this approach to Qwen3-0.6B, we systematically ablate six d...

Originally published on March 30, 2026. Curated by AI News.

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · 12 minutes ago

Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min · 12 minutes ago

Llms

LLM agents can trigger real actions now. But what actually stops them from executing?

We ran into a simple but important issue while building agents with tool calling: the model can propose actions but nothing actually enfo...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

OkCupid gave 3 million dating-app photos to facial recognition firm, FTC says

submitted by /u/Mathemodel [link] [comments]