[2601.20009] LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
About this article
Abstract page for arXiv paper 2601.20009: LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
Computer Science > Computation and Language arXiv:2601.20009 (cs) [Submitted on 27 Jan 2026 (v1), last revised 22 Mar 2026 (this version, v2)] Title:LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them? Authors:J. Ben Tamo, Daniel Carlander-Reuterfelt, Jonathan Rubin, Dezhi Hong, Mingxian Wang, Oleg Poliannikov View a PDF of the paper titled LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?, by J. Ben Tamo and 4 other authors View PDF HTML (experimental) Abstract:Despite multilingual pretraining, large language models often struggle with non-English tasks, particularly in language control, the ability to respond in the intended language. We identify and characterize two key failure modes: the multilingual transfer bottleneck (correct language, incorrect task response) and the language consistency bottleneck (correct task response, wrong language). To systematically surface these issues, we design a four-scenario evaluation protocol spanning MMLU, MGSM, and XQuAD benchmarks. To probe these issues with interpretability, we extend logit lens analysis to track language probabilities layer by layer and compute cross-lingual semantic similarity of hidden states. The results reveal a three-phase internal structure: early layers align inputs into a shared semantic space, middle layers perform task reasoning, and late layers drive language-specific generation. Guided by these insights, we introduce selective fine-tuning of only the final ...