[2602.17171] In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks

[2602.17171] In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks

arXiv - AI 3 min read Article

Summary

This study compares in-context learning (ICL) performance between linear and quadratic attention models on regression tasks, highlighting their similarities and limitations.

Why It Matters

Understanding the differences in ICL performance between linear and quadratic attention models is crucial for optimizing machine learning architectures. This research provides insights that can influence model selection and design in various applications, particularly in regression tasks.

Key Takeaways

  • Linear and quadratic attention models exhibit different ICL behaviors.
  • Model depth significantly impacts ICL performance.
  • The study evaluates learning quality, convergence, and generalization in regression tasks.

Computer Science > Machine Learning arXiv:2602.17171 (cs) [Submitted on 19 Feb 2026] Title:In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks Authors:Ayush Goel, Arjun Kohli, Sarvagya Somvanshi View a PDF of the paper titled In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks, by Ayush Goel and 2 other authors View PDF HTML (experimental) Abstract:Recent work has demonstrated that transformers and linear attention models can perform in-context learning (ICL) on simple function classes, such as linear regression. In this paper, we empirically study how these two attention mechanisms differ in their ICL behavior on the canonical linear-regression task of Garg et al. We evaluate learning quality (MSE), convergence, and generalization behavior of each architecture. We also analyze how increasing model depth affects ICL performance. Our results illustrate both the similarities and limitations of linear attention relative to quadratic attention in this setting. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.17171 [cs.LG]   (or arXiv:2602.17171v1 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2602.17171 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Ayush Goel [view email] [v1] Thu, 19 Feb 2026 08:38:20 UTC (1,244 KB) Full-text links: Access Paper: View a PDF of the paper titled I...

Related Articles

Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] ICML reviewer making up false claim in acknowledgement, what to do?

In a rebuttal acknowledgement we received, the reviewer made up a claim that our method performs worse than baselines with some hyperpara...

Reddit - Machine Learning · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

[D] Budget Machine Learning Hardware

Looking to get into machine learning and found this video on a piece of hardware for less than £500. Is it really possible to teach auton...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime