[2602.05735] CSRv2: Unlocking Ultra-Sparse Embeddings
About this article
Abstract page for arXiv paper 2602.05735: CSRv2: Unlocking Ultra-Sparse Embeddings
Computer Science > Machine Learning arXiv:2602.05735 (cs) [Submitted on 5 Feb 2026 (v1), last revised 2 Mar 2026 (this version, v4)] Title:CSRv2: Unlocking Ultra-Sparse Embeddings Authors:Lixuan Guo, Yifei Wang, Tiansheng Wen, Yifan Wang, Aosong Feng, Bo Chen, Stefanie Jegelka, Chenyu You View a PDF of the paper titled CSRv2: Unlocking Ultra-Sparse Embeddings, by Lixuan Guo and 7 other authors View PDF HTML (experimental) Abstract:In the era of large foundation models, the quality of embeddings has become a central determinant of downstream task performance and overall system capability. Yet widely used dense embeddings are often extremely high-dimensional, incurring substantial costs in storage, memory, and inference latency. To address these, Contrastive Sparse Representation (CSR) is recently proposed as a promising direction, mapping dense embeddings into high-dimensional but k-sparse vectors, in contrast to compact dense embeddings such as Matryoshka Representation Learning (MRL). Despite its promise, CSR suffers severe degradation in the ultra-sparse regime, where over 80% of neurons remain inactive, leaving much of its efficiency potential unrealized. In this paper, we introduce CSRv2, a principled training approach designed to make ultra-sparse embeddings viable. CSRv2 stabilizes sparsity learning through progressive k-annealing, enhances representational quality via supervised contrastive objectives, and ensures end-to-end adaptability with full backbone finetunin...