[2505.06595] Feature Representation Transferring to Lightweight Models via Perception Coherence
Summary
This paper introduces a novel method for transferring feature representations from larger teacher models to lightweight student models using a concept called perception coherence, which enhances performance in machine learning tasks.
Why It Matters
As machine learning models become increasingly complex, transferring knowledge from larger models to smaller, more efficient ones is crucial for practical applications. This research provides a new approach that maintains performance while reducing model size, making it relevant for industries seeking efficient AI solutions.
Key Takeaways
- Introduces perception coherence as a new method for feature transfer.
- Proposes a loss function that focuses on dissimilarity rankings in feature space.
- Demonstrates that lightweight models can achieve competitive performance against larger models.
- Extends the concept of dissimilarity metrics into a probabilistic framework.
- Provides theoretical insights that enhance understanding of feature representation transfer.
Statistics > Machine Learning arXiv:2505.06595 (stat) [Submitted on 10 May 2025 (v1), last revised 21 Feb 2026 (this version, v3)] Title:Feature Representation Transferring to Lightweight Models via Perception Coherence Authors:Hai-Vy Nguyen, Fabrice Gamboa, Sixin Zhang, Reda Chhaibi, Serge Gratton, Thierry Giaccone View a PDF of the paper titled Feature Representation Transferring to Lightweight Models via Perception Coherence, by Hai-Vy Nguyen and 5 other authors View PDF HTML (experimental) Abstract:In this paper, we propose a method for transferring feature representation to lightweight student models from larger teacher models. We mathematically define a new notion called \textit{perception coherence}. Based on this notion, we propose a loss function, which takes into account the dissimilarities between data points in feature space through their ranking. At a high level, by minimizing this loss function, the student model learns to mimic how the teacher model \textit{perceives} inputs. More precisely, our method is motivated by the fact that the representational capacity of the student model is weaker than the teacher model. Hence, we aim to develop a new method allowing for a better relaxation. This means that, the student model does not need to preserve the absolute geometry of the teacher one, while preserving global coherence through dissimilarity ranking. Importantly, while rankings are defined only on finite sets, our notion of \textit{perception coherence} exte...