[2603.03964] BLOCK: An Open-Source Bi-Stage MLLM Character-to-Skin Pipeline for Minecraft
About this article
Abstract page for arXiv paper 2603.03964: BLOCK: An Open-Source Bi-Stage MLLM Character-to-Skin Pipeline for Minecraft
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.03964 (cs) [Submitted on 4 Mar 2026] Title:BLOCK: An Open-Source Bi-Stage MLLM Character-to-Skin Pipeline for Minecraft Authors:Hengquan Guo View a PDF of the paper titled BLOCK: An Open-Source Bi-Stage MLLM Character-to-Skin Pipeline for Minecraft, by Hengquan Guo View PDF HTML (experimental) Abstract:We present \textbf{BLOCK}, an open-source bi-stage character-to-skin pipeline that generates pixel-perfect Minecraft skins from arbitrary character concepts. BLOCK decomposes the problem into (i) a \textbf{3D preview synthesis stage} driven by a large multimodal model (MLLM) with a carefully designed prompt-and-reference template, producing a consistent dual-panel (front/back) oblique-view Minecraft-style preview; and (ii) a \textbf{skin decoding stage} based on a fine-tuned FLUX.2 model that translates the preview into a skin atlas image. We further propose \textbf{EvolveLoRA}, a progressive LoRA curriculum (text-to-image $\rightarrow$ image-to-image $\rightarrow$ preview-to-skin) that initializes each phase from the previous adapter to improve stability and efficiency. BLOCK is released with all prompt templates and fine-tuned weights to support reproducible character-to-skin generation. Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) Cite as: arXiv:2603.03964 [cs.CV] (or arXiv:2603.03964v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2603.03964 ...