[2511.06571] Rep2Text: Decoding Full Text from a Single LLM Token Representation
About this article
Abstract page for arXiv paper 2511.06571: Rep2Text: Decoding Full Text from a Single LLM Token Representation
Computer Science > Computation and Language arXiv:2511.06571 (cs) [Submitted on 9 Nov 2025 (v1), last revised 20 Mar 2026 (this version, v2)] Title:Rep2Text: Decoding Full Text from a Single LLM Token Representation Authors:Haiyan Zhao, Zirui He, Yiming Tang, Fan Yang, Ali Payani, Dianbo Liu, Mengnan Du View a PDF of the paper titled Rep2Text: Decoding Full Text from a Single LLM Token Representation, by Haiyan Zhao and 6 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we investigate a fundamental question: to what extent can the original input text be recovered from a single last-token representation in an LLM? To this end, we propose Rep2Text, a novel framework for decoding text from last-token representations. Rep2Text employs a trainable adapter that maps a target model's last-token representation into the token embedding space of a decoding language model, which then autoregressively reconstructs the input text. Experiments across various model combinations (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B, etc.) show that, on average, roughly half of the tokens in 16-token sequences can be recovered from this compressed representation while preserving strong semantic coherence. Further analysis reveals a clear information bottleneck effect: as sequence length increases, token-level recovery declines, while se...