[2602.15438] Logit Distance Bounds Representational Similarity

[2602.15438] Logit Distance Bounds Representational Similarity

arXiv - AI 4 min read Article

Summary

This paper explores the relationship between logit distance and representational similarity in discriminative models, demonstrating that closeness in logit distance ensures linear similarity guarantees, which is crucial for model distillation.

Why It Matters

Understanding the bounds of representational similarity is vital for improving model distillation techniques in machine learning. This research provides insights that could enhance how models capture and preserve human-interpretable concepts, impacting fields like natural language processing and computer vision.

Key Takeaways

  • Logit distance can provide linear similarity guarantees between models.
  • KL divergence does not always ensure high linear representational similarity.
  • Logit-distance distillation improves the preservation of linearly recoverable concepts.
  • The study introduces a new measure of representational dissimilarity.
  • Findings have implications for distillation in various machine learning applications.

Computer Science > Machine Learning arXiv:2602.15438 (cs) [Submitted on 17 Feb 2026] Title:Logit Distance Bounds Representational Similarity Authors:Beatrix M. B. Nielsen, Emanuele Marconato, Luigi Gresele, Andrea Dittadi, Simon Buchholz View a PDF of the paper titled Logit Distance Bounds Representational Similarity, by Beatrix M. B. Nielsen and 4 other authors View PDF Abstract:For a broad family of discriminative models that includes autoregressive language models, identifiability results imply that if two models induce the same conditional distributions, then their internal representations agree up to an invertible linear transformation. We ask whether an analogous conclusion holds approximately when the distributions are close instead of equal. Building on the observation of Nielsen et al. (2025) that closeness in KL divergence need not imply high linear representational similarity, we study a distributional distance based on logit differences and show that closeness in this distance does yield linear similarity guarantees. Specifically, we define a representational dissimilarity measure based on the models' identifiability class and prove that it is bounded by the logit distance. We further show that, when model probabilities are bounded away from zero, KL divergence upper-bounds logit distance; yet the resulting bound fails to provide nontrivial control in practice. As a consequence, KL-based distillation can match a teacher's predictions while failing to preserve l...

Related Articles

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED
Llms

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Ads are rolling out across the US on ChatGPT’s free tier. I asked OpenAI's bot 500 questions to see what these ads were like and how they...

Wired - AI · 9 min ·
Llms

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Three days ago, I clicked the "Deploy OpenClaw In Seconds" button to get an overview of the new service, but I didn't build any automatio...

Reddit - Artificial Intelligence · 1 min ·
Google’s Gemini AI app debuts in Hong Kong
Llms

Google’s Gemini AI app debuts in Hong Kong

Tech giant’s chatbot service tops Apple’s app store chart in the city.

AI Tools & Products · 2 min ·
Google Launches Gemini Import Tools to Poach Users From Rival AI Apps
Llms

Google Launches Gemini Import Tools to Poach Users From Rival AI Apps

Anyone looking to switch their AI assistant will find it surprisingly easy, as it only takes a few steps to move from A to B. This is not...

AI Tools & Products · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime