[2604.00223] Diversity-Aware Reverse Kullback-Leibler Divergence for

[2604.00223] Diversity-Aware Reverse Kullback-Leibler Divergence for Large Language Model Distillation

arXiv - AI April 02, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.00223: Diversity-Aware Reverse Kullback-Leibler Divergence for Large Language Model Distillation

Computer Science > Machine Learning arXiv:2604.00223 (cs) [Submitted on 31 Mar 2026] Title:Diversity-Aware Reverse Kullback-Leibler Divergence for Large Language Model Distillation Authors:Hoang-Chau Luong, Dat Ba Tran, Lingwei Chen View a PDF of the paper titled Diversity-Aware Reverse Kullback-Leibler Divergence for Large Language Model Distillation, by Hoang-Chau Luong and 2 other authors View PDF HTML (experimental) Abstract:Reverse Kullback-Leibler (RKL) divergence has recently emerged as the preferred objective for large language model (LLM) distillation, consistently outperforming forward KL (FKL), particularly in regimes with large vocabularies and significant teacher-student capacity mismatch, where RKL focuses learning on dominant modes rather than enforcing dense alignment. However, RKL introduces a structural limitation that drives the student toward overconfident predictions. We first provide an analysis of RKL by decomposing its gradients into target and non-target components, and show that non-target gradients consistently push the target logit upward even when the student already matches the teacher, thereby reducing output diversity. In addition, RKL provides weak supervision over non-target classes, leading to poor tail alignment. To address these issues, we propose Diversity-aware RKL (DRKL), which removes this gradient effect and strengthens non-target supervision while preserving the optimization benefits of RKL. Extensive experiments across datasets a...

Originally published on April 02, 2026. Curated by AI News.

Llms

I can't help rooting for tiny open source AI model maker Arcee | TechCrunch

Arcee is a tiny 26-person U.S. startup that built a high-performing, massive, open source LLM. And it's gaining popularity with OpenClaw ...

TechCrunch - AI · 4 min · about 1 hour ago

Llms

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The AI lab's Project Glasswing will bring together Apple, Google, and more than 45 other organizations. They'll use the new Claude Mythos...

Wired - AI · 7 min · about 4 hours ago

Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

Agents that write their own code at runtime and vote on capabilities, no human in the loop

hollowOS just hit v4.4 and I added something that I haven’t seen anyone else do. Previous versions gave you an OS for agents: structured ...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

[2604.00223] Diversity-Aware Reverse Kullback-Leibler Divergence for Large Language Model Distillation

About this article

Related Articles

I can't help rooting for tiny open source AI model maker Arcee | TechCrunch

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

Agents that write their own code at runtime and vote on capabilities, no human in the loop

No comments

Stay updated with AI News