[2602.19109] Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions

[2602.19109] Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions

arXiv - AI 3 min read Article

Summary

The paper examines three-digit addition in Meta-Llama-3-8B, focusing on how arithmetic results are determined post-routing, emphasizing the role of the last input token in the process.

Why It Matters

This research provides insights into the inner workings of Llama-3's arithmetic capabilities, revealing how token interactions influence outcomes. Understanding these mechanisms can enhance the development of more efficient AI models and improve their performance in numerical tasks.

Key Takeaways

  • Post-routing arithmetic in Llama-3 shows a shift in how results are generated, relying heavily on the last input token.
  • Causal residual patching and attention ablations highlight a significant boundary in processing layers.
  • Digit direction dictionaries adapt based on context but maintain a relationship through low-rank alignment.
  • Naive cross-context transfer fails, but learned rotation directions can restore edits effectively.
  • This research can inform future AI model designs, particularly in arithmetic tasks.

Computer Science > Artificial Intelligence arXiv:2602.19109 (cs) [Submitted on 22 Feb 2026] Title:Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions Authors:Yao Yan View a PDF of the paper titled Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions, by Yao Yan View PDF HTML (experimental) Abstract:We study three-digit addition in Meta-Llama-3-8B (base) under a one-token readout to characterize how arithmetic answers are finalized after cross-token routing becomes causally irrelevant. Causal residual patching and cumulative attention ablations localize a sharp boundary near layer~17: beyond it, the decoded sum is controlled almost entirely by the last input token and late-layer self-attention is largely dispensable. In this post-routing regime, digit(-sum) direction dictionaries vary with a next-higher-digit context but are well-related by an approximately orthogonal map inside a shared low-rank subspace (low-rank Procrustes alignment). Causal digit editing matches this geometry: naive cross-context transfer fails, while rotating directions through the learned map restores strict counterfactual edits; negative controls do not recover. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.19109 [cs.AI]   (or arXiv:2602.19109v1 [cs.AI] for this version)   https://doi.org/10.48550/arXiv.2602.19109 Focus to learn more arXiv-issued DOI via DataCite (pending registration)...

Related Articles

Llms

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do fronti...

Reddit - Machine Learning · 1 min ·
I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED
Llms

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Ads are rolling out across the US on ChatGPT’s free tier. I asked OpenAI's bot 500 questions to see what these ads were like and how they...

Wired - AI · 9 min ·
Llms

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Three days ago, I clicked the "Deploy OpenClaw In Seconds" button to get an overview of the new service, but I didn't build any automatio...

Reddit - Artificial Intelligence · 1 min ·
Google’s Gemini AI app debuts in Hong Kong
Llms

Google’s Gemini AI app debuts in Hong Kong

Tech giant’s chatbot service tops Apple’s app store chart in the city.

AI Tools & Products · 2 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime