Llms Machine Learning Nlp Ai Safety

[2602.15438] Logit Distance Bounds Representational Similarity

arXiv - AI February 18, 2026 4 min read Article

Summary

This paper explores the relationship between logit distance and representational similarity in discriminative models, demonstrating that closeness in logit distance ensures linear similarity guarantees, which is crucial for model distillation.

Why It Matters

Understanding the bounds of representational similarity is vital for improving model distillation techniques in machine learning. This research provides insights that could enhance how models capture and preserve human-interpretable concepts, impacting fields like natural language processing and computer vision.

Key Takeaways

Logit distance can provide linear similarity guarantees between models.
KL divergence does not always ensure high linear representational similarity.
Logit-distance distillation improves the preservation of linearly recoverable concepts.
The study introduces a new measure of representational dissimilarity.
Findings have implications for distillation in various machine learning applications.

Computer Science > Machine Learning arXiv:2602.15438 (cs) [Submitted on 17 Feb 2026] Title:Logit Distance Bounds Representational Similarity Authors:Beatrix M. B. Nielsen, Emanuele Marconato, Luigi Gresele, Andrea Dittadi, Simon Buchholz View a PDF of the paper titled Logit Distance Bounds Representational Similarity, by Beatrix M. B. Nielsen and 4 other authors View PDF Abstract:For a broad family of discriminative models that includes autoregressive language models, identifiability results imply that if two models induce the same conditional distributions, then their internal representations agree up to an invertible linear transformation. We ask whether an analogous conclusion holds approximately when the distributions are close instead of equal. Building on the observation of Nielsen et al. (2025) that closeness in KL divergence need not imply high linear representational similarity, we study a distributional distance based on logit differences and show that closeness in this distance does yield linear similarity guarantees. Specifically, we define a representational dissimilarity measure based on the models' identifiability class and prove that it is bounded by the logit distance. We further show that, when model probabilities are bounded away from zero, KL divergence upper-bounds logit distance; yet the resulting bound fails to provide nontrivial control in practice. As a consequence, KL-based distillation can match a teacher's predictions while failing to preserve l...

Read Original Article