[2602.17063] Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression
Summary
The paper discusses 'Sign Lock-In,' a phenomenon in machine learning where randomly initialized weight signs persist during model training, impacting sub-bit model compression.
Why It Matters
Understanding sign persistence in neural networks is crucial for optimizing model compression techniques, particularly as AI models grow larger and require efficient storage solutions. This research provides insights that could enhance performance while reducing resource consumption.
Key Takeaways
- Sign lock-in theory explains the persistence of weight signs during training.
- The phenomenon affects model compression, particularly in sub-bit scenarios.
- A gap-based initialization and regularization technique can reduce sign flip rates significantly.
Computer Science > Machine Learning arXiv:2602.17063 (cs) [Submitted on 19 Feb 2026] Title:Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression Authors:Akira Sakai, Yuma Ichikawa View a PDF of the paper titled Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression, by Akira Sakai and 1 other authors View PDF Abstract:Sub-bit model compression seeks storage below one bit per weight; as magnitudes are aggressively compressed, the sign bit becomes a fixed-cost bottleneck. Across Transformers, CNNs, and MLPs, learned sign matrices resist low-rank approximation and are spectrally indistinguishable from an i.i.d. Rademacher baseline. Despite this apparent randomness, most weights retain their initialization signs; flips primarily occur via rare near-zero boundary crossings, suggesting that sign-pattern randomness is largely inherited from initialization. We formalize this behavior with sign lock-in theory, a stopping-time analysis of sign flips under SGD noise. Under bounded updates and a rare re-entry condition into a small neighborhood around zero, the number of effective sign flips exhibits a geometric tail. Building on this mechanism, we introduce a gap-based initialization and a lightweight outward-drift regularizer, reducing the effective flip rate to approximately $10^{-3}$ with only about a one-point increase in perplexity. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs...