[2602.09229] Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning
About this article
Abstract page for arXiv paper 2602.09229: Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning
Computer Science > Machine Learning arXiv:2602.09229 (cs) [Submitted on 9 Feb 2026 (v1), last revised 5 Mar 2026 (this version, v2)] Title:Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning Authors:Xincan Feng, Taro Watanabe View a PDF of the paper titled Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning, by Xincan Feng and 1 other authors View PDF Abstract:Cosine similarity is prevalent in contrastive learning, yet it assumes embedding magnitude is noise. We systematically study magnitude learning through a framework that independently controls query-side and document-side normalization. First, magnitude learning benefits retrieval and Retrieval-Augmented Generation (RAG) where queries and documents have distinct roles, but not Semantic Textual Similarity (STS) or CLIP where inputs are interchangeable. Second, query and document magnitudes serve different roles: document magnitude scales inference scores, while query magnitude modulates training gradients. Normalizing one side consistently outperforms both sides, and the Fisher Information Matrix condition number predicts which side to normalize. Third, magnitude learning improves out-of-domain generalization more than in-domain performance, with gains up to +72\% vs +7\%, requiring retrieval-specialized pre-training or sufficient data. These findings provide practical guidance for retrieval and RAG across text and vision domains. Comments: Subjects: Machine Learning (cs.LG); I...