[2602.19393] In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom
Summary
This paper defends cosine similarity in machine learning, arguing that normalization eliminates issues related to gauge freedom, thus ensuring accurate distance measurements in embeddings.
Why It Matters
Understanding the validity of cosine similarity is crucial for practitioners in machine learning, particularly those using embeddings. This paper clarifies misconceptions about cosine similarity's reliability when embeddings are properly normalized, which can significantly impact model performance and interpretation.
Key Takeaways
- Cosine similarity is valid when embeddings are normalized.
- Normalization removes gauge freedom issues associated with cosine similarity.
- Cosine distance equates to half the squared Euclidean distance on normalized embeddings.
- Misinterpretations of cosine similarity stem from incompatible training objectives.
- Proper normalization leads to identical neighbor rankings in cosine and Euclidean spaces.
Computer Science > Machine Learning arXiv:2602.19393 (cs) [Submitted on 23 Feb 2026] Title:In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom Authors:Taha Bouhsine View a PDF of the paper titled In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom, by Taha Bouhsine View PDF HTML (experimental) Abstract:Steck, Ekanadham, and Kallus [arXiv:2403.05440] demonstrate that cosine similarity of learned embeddings from matrix factorization models can be rendered arbitrary by a diagonal ``gauge'' matrix $D$. Their result is correct and important for practitioners who compute cosine similarity on embeddings trained with dot-product objectives. However, we argue that their conclusion, cautioning against cosine similarity in general, conflates the pathology of an incompatible training objective with the geometric validity of cosine distance on the unit sphere. We prove that when embeddings are constrained to the unit sphere $\mathbb{S}^{d-1}$ (either during or after training with an appropriate objective), the $D$-matrix ambiguity vanishes identically, and cosine distance reduces to exactly half the squared Euclidean distance. This monotonic equivalence implies that cosine-based and Euclidean-based neighbor rankings are identical on normalized embeddings. The ``problem'' with cosine similarity is not cosine similarity, it is the failure to normalize. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.19393 [cs.LG] (or arXiv:2602.193...