[2505.12988] Optimal Formats for Weight Quantisation

[2505.12988] Optimal Formats for Weight Quantisation

arXiv - Machine Learning 4 min read Article

Summary

This paper presents a systematic framework for designing weight quantisation formats in deep learning, demonstrating that variable-length codes can enhance performance and reduce model size.

Why It Matters

Weight quantisation is crucial for optimizing deep learning models, especially in resource-constrained environments. This research provides a structured approach to improve quantisation formats, potentially leading to more efficient AI applications and advancements in model deployment.

Key Takeaways

  • Proposes a framework for systematic design of quantisation formats.
  • Highlights the advantages of variable-length coding in quantisation.
  • Demonstrates improved performance of non-linear quantisation curves.
  • Shows potential savings of up to 0.25 bits per parameter in large language models.
  • Connects quantisation design with classical quantisation theory.

Computer Science > Machine Learning arXiv:2505.12988 (cs) [Submitted on 19 May 2025 (v1), last revised 13 Feb 2026 (this version, v3)] Title:Optimal Formats for Weight Quantisation Authors:Douglas Orr, Luka Ribar, Carlo Luschi View a PDF of the paper titled Optimal Formats for Weight Quantisation, by Douglas Orr and 2 other authors View PDF Abstract:Weight quantisation is an essential technique for enabling efficient training and deployment of modern deep learning models. However, the recipe book of quantisation formats is large and formats are often chosen empirically. In this paper, we propose a framework for systematic design and analysis of quantisation formats. By connecting the question of format design with the classical quantisation theory, we show that the strong practical performance of popular formats comes from their ability to represent values using variable-length codes. We frame the problem as minimising the KL divergence between original and quantised model outputs under a model size constraint, which can be approximated by minimising the squared quantisation error, a well-studied problem where entropy-constrained quantisers with variable-length codes are optimal. We develop non-linear quantisation curves for block-scaled data across multiple distribution families and observe that these formats, along with sparse outlier formats, consistently outperform fixed-length formats, indicating that they also exploit variable-length encoding. Finally, by using the r...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
New technique makes AI models leaner and faster while they’re still learning
Machine Learning

New technique makes AI models leaner and faster while they’re still learning

AI News - General · 9 min ·
Machine Learning

What are the future prospects of Spiking Neural Networks (and particularly, neuromorphics computing) and Liquid Neural Networks? [D]

Question to discuss. I'm an undergrad and stumbled across these new forms of neural networks but I haven't seen mainstream adoption of th...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime