[2602.17171] In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks
Summary
This study compares in-context learning (ICL) performance between linear and quadratic attention models on regression tasks, highlighting their similarities and limitations.
Why It Matters
Understanding the differences in ICL performance between linear and quadratic attention models is crucial for optimizing machine learning architectures. This research provides insights that can influence model selection and design in various applications, particularly in regression tasks.
Key Takeaways
- Linear and quadratic attention models exhibit different ICL behaviors.
- Model depth significantly impacts ICL performance.
- The study evaluates learning quality, convergence, and generalization in regression tasks.
Computer Science > Machine Learning arXiv:2602.17171 (cs) [Submitted on 19 Feb 2026] Title:In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks Authors:Ayush Goel, Arjun Kohli, Sarvagya Somvanshi View a PDF of the paper titled In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks, by Ayush Goel and 2 other authors View PDF HTML (experimental) Abstract:Recent work has demonstrated that transformers and linear attention models can perform in-context learning (ICL) on simple function classes, such as linear regression. In this paper, we empirically study how these two attention mechanisms differ in their ICL behavior on the canonical linear-regression task of Garg et al. We evaluate learning quality (MSE), convergence, and generalization behavior of each architecture. We also analyze how increasing model depth affects ICL performance. Our results illustrate both the similarities and limitations of linear attention relative to quadratic attention in this setting. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.17171 [cs.LG] (or arXiv:2602.17171v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.17171 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Ayush Goel [view email] [v1] Thu, 19 Feb 2026 08:38:20 UTC (1,244 KB) Full-text links: Access Paper: View a PDF of the paper titled I...