[2602.14432] S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations

[2602.14432] S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations

arXiv - AI 4 min read Article

Summary

The paper introduces Selective Spectral Decay (S2D), a method to improve quantization in neural networks by addressing activation outliers, enhancing model efficiency without sacrificing accuracy.

Why It Matters

With the increasing scale of transformer models, managing activation outliers is crucial for maintaining accuracy during quantization. S2D offers a novel approach to this problem, potentially impacting deployment efficiency across various AI applications.

Key Takeaways

  • S2D reduces activation outliers in large-scale transformer models.
  • The method improves post-training quantization accuracy by up to 7% on ImageNet.
  • S2D generalizes well across different downstream tasks and vision-language models.
  • The approach allows for scaling models without compromising deployment efficiency.
  • Empirical studies link activation outliers to dominant singular values of weights.

Computer Science > Machine Learning arXiv:2602.14432 (cs) [Submitted on 16 Feb 2026] Title:S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations Authors:Arnav Chavan, Nahush Lele, Udbhav Bamba, Sankalp Dayal, Aditi Raghunathan, Deepak Gupta View a PDF of the paper titled S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations, by Arnav Chavan and Nahush Lele and Udbhav Bamba and Sankalp Dayal and Aditi Raghunathan and Deepak Gupta View PDF HTML (experimental) Abstract:Activation outliers in large-scale transformer models pose a fundamental challenge to model quantization, creating excessively large ranges that cause severe accuracy drops during quantization. We empirically observe that outlier severity intensifies with pre-training scale (e.g., progressing from CLIP to the more extensively trained SigLIP and SigLIP2). Through theoretical analysis as well as empirical correlation studies, we establish the direct link between these activation outliers and dominant singular values of the weights. Building on this insight, we propose Selective Spectral Decay ($S^2D$), a geometrically-principled conditioning method that surgically regularizes only the weight components corresponding to the largest singular values during fine-tuning. Through extensive experiments, we demonstrate that $S^2D$ significantly reduces activation outliers and produces well-conditioned representations that are inherently quantizatio...

Related Articles

Machine Learning

Looking to join a team working on AI/CV research (aiming to publish) [R]

Hi, I am currently working as a research assistant in my college, but I want to do more serious research and learn more from it. I’m inte...

Reddit - Machine Learning · 1 min ·
Machine Learning

Fed Chair Jerome Powell, Treasury's Bessent and top bank CEOs met over Anthropic's Mythos model

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Anthropic’s Mythos Will Force a Cybersecurity Reckoning—Just Not the One You Think | WIRED
Machine Learning

Anthropic’s Mythos Will Force a Cybersecurity Reckoning—Just Not the One You Think | WIRED

The new AI model is being heralded—and feared—as a hacker’s superweapon. Experts say its arrival is a wake-up call for developers who hav...

Wired - AI · 9 min ·
Machine Learning

Is google deepmind known to ghost applicants? [D]

Hey sub, I'm sorry if this is a wrong place to ask but I don't see a sub for ML roles separately. I was wondering if deepmind is known to...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime