[2604.04037] Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory

[2604.04037] Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2604.04037: Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory

Computer Science > Machine Learning arXiv:2604.04037 (cs) [Submitted on 5 Apr 2026] Title:Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory Authors:Dawar Jyoti Deka, Nilesh Sarkar View a PDF of the paper titled Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory, by Dawar Jyoti Deka and 1 other authors View PDF HTML (experimental) Abstract:Knowledge distillation compresses large teachers into smaller students, but performance saturates at a loss floor that persists across training methods and objectives. We argue this floor is geometric: neural networks represent far more features than dimensions through superposition, and a student of width $d_S$ can encode at most $d_S \cdot g(\alpha)$ features, where $g(\alpha) = 1/((1-\alpha)\ln\frac{1}{1-\alpha})$ is a sparsity-dependent capacity function. Features beyond this budget are permanently lost, yielding an importance-weighted loss floor. We validate on a toy model (48 configurations, median accuracy >93%) and on Pythia-410M, where sparse autoencoders measure $F \approx 28{,}700$ features at $\alpha \approx 0.992$ (critical width $d_S^* \approx 1{,}065$). Distillation into five student widths confirms the predicted monotonic floor ordering. The observed floor decomposes into a geometric component and a width-independent architectural baseline ($R^2 = 0.993$). Linear probing shows coarse concepts survive even 88% feature loss, revealing the fl...

Originally published on April 07, 2026. Curated by AI News.

Related Articles

Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
New technique makes AI models leaner and faster while they’re still learning
Machine Learning

New technique makes AI models leaner and faster while they’re still learning

AI News - General · 9 min ·
Machine Learning

Fixing Unsupervised Hyperbolic Contrastive Loss [D]

Hello all, I am trying to implement Unsupervised Hyperbolic Contrastive Loss on the ImageNet-1k dataset. My results show that simple Eucl...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime