[2602.17063] Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression

[2602.17063] Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression

arXiv - AI 3 min read Article

Summary

The paper discusses 'Sign Lock-In,' a phenomenon in machine learning where randomly initialized weight signs persist during model training, impacting sub-bit model compression.

Why It Matters

Understanding sign persistence in neural networks is crucial for optimizing model compression techniques, particularly as AI models grow larger and require efficient storage solutions. This research provides insights that could enhance performance while reducing resource consumption.

Key Takeaways

  • Sign lock-in theory explains the persistence of weight signs during training.
  • The phenomenon affects model compression, particularly in sub-bit scenarios.
  • A gap-based initialization and regularization technique can reduce sign flip rates significantly.

Computer Science > Machine Learning arXiv:2602.17063 (cs) [Submitted on 19 Feb 2026] Title:Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression Authors:Akira Sakai, Yuma Ichikawa View a PDF of the paper titled Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression, by Akira Sakai and 1 other authors View PDF Abstract:Sub-bit model compression seeks storage below one bit per weight; as magnitudes are aggressively compressed, the sign bit becomes a fixed-cost bottleneck. Across Transformers, CNNs, and MLPs, learned sign matrices resist low-rank approximation and are spectrally indistinguishable from an i.i.d. Rademacher baseline. Despite this apparent randomness, most weights retain their initialization signs; flips primarily occur via rare near-zero boundary crossings, suggesting that sign-pattern randomness is largely inherited from initialization. We formalize this behavior with sign lock-in theory, a stopping-time analysis of sign flips under SGD noise. Under bounded updates and a rare re-entry condition into a small neighborhood around zero, the number of effective sign flips exhibits a geometric tail. Building on this mechanism, we introduce a gap-based initialization and a lightweight outward-drift regularizer, reducing the effective flip rate to approximately $10^{-3}$ with only about a one-point increase in perplexity. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs...

Related Articles

Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] ICML reviewer making up false claim in acknowledgement, what to do?

In a rebuttal acknowledgement we received, the reviewer made up a claim that our method performs worse than baselines with some hyperpara...

Reddit - Machine Learning · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

[D] Budget Machine Learning Hardware

Looking to get into machine learning and found this video on a piece of hardware for less than £500. Is it really possible to teach auton...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime