Llms Open Source Ai Generative Ai Ai Agents

[2602.19109] Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions

arXiv - AI February 24, 2026 3 min read Article

Summary

The paper examines three-digit addition in Meta-Llama-3-8B, focusing on how arithmetic results are determined post-routing, emphasizing the role of the last input token in the process.

Why It Matters

This research provides insights into the inner workings of Llama-3's arithmetic capabilities, revealing how token interactions influence outcomes. Understanding these mechanisms can enhance the development of more efficient AI models and improve their performance in numerical tasks.

Key Takeaways

Post-routing arithmetic in Llama-3 shows a shift in how results are generated, relying heavily on the last input token.
Causal residual patching and attention ablations highlight a significant boundary in processing layers.
Digit direction dictionaries adapt based on context but maintain a relationship through low-rank alignment.
Naive cross-context transfer fails, but learned rotation directions can restore edits effectively.
This research can inform future AI model designs, particularly in arithmetic tasks.

Computer Science > Artificial Intelligence arXiv:2602.19109 (cs) [Submitted on 22 Feb 2026] Title:Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions Authors:Yao Yan View a PDF of the paper titled Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions, by Yao Yan View PDF HTML (experimental) Abstract:We study three-digit addition in Meta-Llama-3-8B (base) under a one-token readout to characterize how arithmetic answers are finalized after cross-token routing becomes causally irrelevant. Causal residual patching and cumulative attention ablations localize a sharp boundary near layer~17: beyond it, the decoded sum is controlled almost entirely by the last input token and late-layer self-attention is largely dispensable. In this post-routing regime, digit(-sum) direction dictionaries vary with a next-higher-digit context but are well-related by an approximately orthogonal map inside a shared low-rank subspace (low-rank Procrustes alignment). Causal digit editing matches this geometry: naive cross-context transfer fails, while rotating directions through the learned map restores strict counterfactual edits; negative controls do not recover. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.19109 [cs.AI] (or arXiv:2602.19109v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.19109 Focus to learn more arXiv-issued DOI via DataCite (pending registration)...

Read Original Article

[2602.19109] Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions

Summary

Why It Matters

Key Takeaways

Related Articles

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Google’s Gemini AI app debuts in Hong Kong

No comments

Stay updated with AI News