[2602.14635] Alignment Adapter to Improve the Performance of Compressed Deep Learning Models
Summary
The paper introduces the Alignment Adapter (AlAd), a method to enhance the performance of compressed deep learning models by aligning their token-level embeddings with those of larger models, thereby preserving contextual semantics.
Why It Matters
As deep learning models become increasingly essential in resource-constrained environments, improving their performance without significantly increasing size or latency is critical. The Alignment Adapter offers a promising solution to bridge the performance gap between compressed and larger models, making advanced AI more accessible.
Key Takeaways
- Alignment Adapter (AlAd) improves the performance of compressed models.
- AlAd aligns token-level embeddings with larger models to preserve semantics.
- The method is agnostic to compression techniques and can be easily integrated.
- Experiments show significant performance gains with minimal overhead.
- AlAd can be used as a plug-and-play module or jointly fine-tuned.
Computer Science > Machine Learning arXiv:2602.14635 (cs) [Submitted on 16 Feb 2026] Title:Alignment Adapter to Improve the Performance of Compressed Deep Learning Models Authors:Rohit Raj Rai, Abhishek Dhaka, Amit Awekar View a PDF of the paper titled Alignment Adapter to Improve the Performance of Compressed Deep Learning Models, by Rohit Raj Rai and 2 other authors View PDF HTML (experimental) Abstract:Compressed Deep Learning (DL) models are essential for deployment in resource-constrained environments. But their performance often lags behind their large-scale counterparts. To bridge this gap, we propose Alignment Adapter (AlAd): a lightweight, sliding-window-based adapter. It aligns the token-level embeddings of a compressed model with those of the original large model. AlAd preserves local contextual semantics, enables flexible alignment across differing dimensionalities or architectures, and is entirely agnostic to the underlying compression method. AlAd can be deployed in two ways: as a plug-and-play module over a frozen compressed model, or by jointly fine-tuning AlAd with the compressed model for further performance gains. Through experiments on BERT-family models across three token-level NLP tasks, we demonstrate that AlAd significantly boosts the performance of compressed models with only marginal overhead in size and latency. Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR) Cite as: arXiv:2602.14635 [cs.LG] ...