[2602.01554] InfoTok: Information-Theoretic Regularization for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs
About this article
Abstract page for arXiv paper 2602.01554: InfoTok: Information-Theoretic Regularization for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs
Computer Science > Machine Learning arXiv:2602.01554 (cs) [Submitted on 2 Feb 2026 (v1), last revised 6 Apr 2026 (this version, v2)] Title:InfoTok: Information-Theoretic Regularization for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs Authors:Lv Tang, Tianyi Zheng, Bo Li, Xingyu Li View a PDF of the paper titled InfoTok: Information-Theoretic Regularization for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs, by Lv Tang and 3 other authors View PDF HTML (experimental) Abstract:Unified multimodal large language models (MLLMs) aim to unify image understanding and image generation within a single framework, where a shared visual tokenizer serves as the sole interface that maps high-dimensional images into a limited token budget for downstream multimodal reasoning and synthesis. However, existing shared-token designs are largely architecture-driven and lack an explicit criterion for what information should be preserved to simultaneously support semantic abstraction and visual detail. In this paper, we adopt a capacity-constrained perspective, viewing the shared tokenizer as a compute-bounded learner whose finite representational budget should prioritize reusable structure over hard-to-exploit high-entropy variations and redundancy. Motivated by this view, we propose \textbf{\textit{InfoTok}}, an information-regularized tokenization mechanism grounded in the Information Bottleneck (IB) principle. InfoTok explicitly controls information flow f...