Machine Learning Nlp Ai Agents

[2602.17171] In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks

arXiv - AI February 20, 2026 3 min read Article

Summary

This study compares in-context learning (ICL) performance between linear and quadratic attention models on regression tasks, highlighting their similarities and limitations.

Why It Matters

Understanding the differences in ICL performance between linear and quadratic attention models is crucial for optimizing machine learning architectures. This research provides insights that can influence model selection and design in various applications, particularly in regression tasks.

Key Takeaways

Linear and quadratic attention models exhibit different ICL behaviors.
Model depth significantly impacts ICL performance.
The study evaluates learning quality, convergence, and generalization in regression tasks.

Computer Science > Machine Learning arXiv:2602.17171 (cs) [Submitted on 19 Feb 2026] Title:In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks Authors:Ayush Goel, Arjun Kohli, Sarvagya Somvanshi View a PDF of the paper titled In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks, by Ayush Goel and 2 other authors View PDF HTML (experimental) Abstract:Recent work has demonstrated that transformers and linear attention models can perform in-context learning (ICL) on simple function classes, such as linear regression. In this paper, we empirically study how these two attention mechanisms differ in their ICL behavior on the canonical linear-regression task of Garg et al. We evaluate learning quality (MSE), convergence, and generalization behavior of each architecture. We also analyze how increasing model depth affects ICL performance. Our results illustrate both the similarities and limitations of linear attention relative to quadratic attention in this setting. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.17171 [cs.LG] (or arXiv:2602.17171v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.17171 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Ayush Goel [view email] [v1] Thu, 19 Feb 2026 08:38:20 UTC (1,244 KB) Full-text links: Access Paper: View a PDF of the paper titled I...

Read Original Article

Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

[D] ICML reviewer making up false claim in acknowledgement, what to do?

In a rebuttal acknowledgement we received, the reviewer made up a claim that our method performs worse than baselines with some hyperpara...

Reddit - Machine Learning · 1 min · about 2 hours ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 4 hours ago

Machine Learning

[D] Budget Machine Learning Hardware

Looking to get into machine learning and found this video on a piece of hardware for less than £500. Is it really possible to teach auton...

Reddit - Machine Learning · 1 min · about 6 hours ago