[2505.22811] Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Summary
The paper presents a novel framework for large language models (LLMs) using multi-kernel Boolean parameters, enhancing efficiency and effectiveness by enabling direct finetuning in the Boolean domain.
Why It Matters
This research addresses the growing need for efficient LLMs amidst increasing model complexity. By introducing a method that reduces both the computational burden and performance loss associated with traditional binarization techniques, it has significant implications for AI applications requiring rapid processing and lower resource consumption.
Key Takeaways
- Introduces a framework for LLMs using multi-kernel Boolean parameters.
- Eliminates the need for latent weights, enhancing efficiency.
- Demonstrates superior performance over existing low-bit quantization methods.
- Facilitates direct finetuning in the Boolean domain.
- Addresses the challenges of complexity in large language models.
Statistics > Machine Learning arXiv:2505.22811 (stat) [Submitted on 28 May 2025 (v1), last revised 25 Feb 2026 (this version, v3)] Title:Highly Efficient and Effective LLMs with Multi-Boolean Architectures Authors:Ba-Hien Tran, Van Minh Nguyen View a PDF of the paper titled Highly Efficient and Effective LLMs with Multi-Boolean Architectures, by Ba-Hien Tran and 1 other authors View PDF Abstract:Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and training-aware methods, which depend on full-precision latent weights, adding complexity and limiting efficiency. We propose a novel framework that represents LLMs with multi-kernel Boolean parameters and, for the first time, enables direct finetuning LMMs in the Boolean domain, eliminating the need for latent weights. This enhances representational capacity and dramatically reduces complexity during both finetuning and inference. Extensive experiments across diverse LLMs show our method outperforms recent ultra low-bit quantization and binarization techniques. Comments: Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG) Cite as: arXiv:2505.22811 [stat.ML] (or arXiv:2505.22811v3 [stat.ML] for this version) https://doi.org/10.48550/arXiv.2505.22811 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Ba-Hien Tran [view em...