[2603.03389] Towards Improved Sentence Representations using Token

[2603.03389] Towards Improved Sentence Representations using Token Graphs

arXiv - Machine Learning March 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.03389: Towards Improved Sentence Representations using Token Graphs

Computer Science > Machine Learning arXiv:2603.03389 (cs) [Submitted on 3 Mar 2026] Title:Towards Improved Sentence Representations using Token Graphs Authors:Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Zorah Lähner, Moshe Eliasof View a PDF of the paper titled Towards Improved Sentence Representations using Token Graphs, by Krishna Sri Ipsit Mantri and 3 other authors View PDF HTML (experimental) Abstract:Obtaining a single-vector representation from a Large Language Model's (LLM) token-level outputs is a critical step for nearly all sentence-level tasks. However, standard pooling methods like mean or max aggregation treat tokens as an independent set, discarding the rich relational structure captured by the model's self-attention layers and making them susceptible to signal dilution. To address this, we introduce GLOT, a lightweight, structure-aware pooling module that reframes pooling as relational learning followed by aggregation. Operating on the outputs of a frozen LLM, GLOT first constructs a latent token-similarity graph, then refines token representations with a graph neural network, and finally aggregates them using a readout layer. Experimentally, our approach is remarkably robust and efficient: on a diagnostic stress test where 90% of tokens are random distractors, GLOT maintains over 97% accuracy while baseline methods collapse. Furthermore, it is competitive with state-of-the-art techniques on benchmarks like GLUE and MTEB with 20x fewer trainable par...

Originally published on March 05, 2026. Curated by AI News.

Llms

This Is Not Hacking. This Is Structured Intelligence.

Watch me demonstrate everything I've been talking about—live, in real time. The Setup: Maestro University AI enrollment system Standard c...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

Hi Everybody! I just wanted to share an update on a project I’ve been working on called BULaMU, a family of language models trained (20M,...

Reddit - Machine Learning · 1 min · about 4 hours ago

Llms

Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch

LiteLLM had obtained two security compliance certifications via Delve and fell victim to some horrific credential-stealing malware last w...

TechCrunch - AI · 3 min · about 6 hours ago

[2603.03389] Towards Improved Sentence Representations using Token Graphs

About this article

Related Articles

This Is Not Hacking. This Is Structured Intelligence.

[D] Howcome Muon is only being used for Transformers?

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch

No comments

Stay updated with AI News