[2603.04516] Augmenting representations with scientific papers
About this article
Abstract page for arXiv paper 2603.04516: Augmenting representations with scientific papers
Computer Science > Machine Learning arXiv:2603.04516 (cs) [Submitted on 4 Mar 2026] Title:Augmenting representations with scientific papers Authors:Nicolò Oreste Pinciroli Vago, Rocco Di Tella, Carolina Cuesta-Lázaro, Michael J. Smith, Cecilia Garraffo, Rafael Martínez-Galarza View a PDF of the paper titled Augmenting representations with scientific papers, by Nicol\`o Oreste Pinciroli Vago and 5 other authors View PDF HTML (experimental) Abstract:Astronomers have acquired vast repositories of multimodal data, including images, spectra, and time series, complemented by decades of literature that analyzes astrophysical sources. Still, these data sources are rarely systematically integrated. This work introduces a contrastive learning framework designed to align X-ray spectra with domain knowledge extracted from scientific literature, facilitating the development of shared multimodal representations. Establishing this connection is inherently complex, as scientific texts encompass a broader and more diverse physical context than spectra. We propose a contrastive pipeline that achieves a 20% Recall@1% when retrieving texts from spectra, proving that a meaningful alignment between these modalities is not only possible but capable of accelerating the interpretation of rare or poorly understood sources. Furthermore, the resulting shared latent space effectively encodes physically significant information. By fusing spectral and textual data, we improve the estimation of 20 physic...