Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents
About this article
A Blog post by IBM Granite on Hugging Face
Back to Articles Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents Enterprise Article Published March 31, 2026 Upvote 6 Madison Lee kristunlee Follow ibm-granite Rogerio Feris rferis Follow ibm-granite Eli Schwartz elischwartz Follow ibm-granite Dhiraj Joshi dhirajjoshi116 Follow ibm-granite Pengyuan Li pengyuan Follow ibm-granite Isaac Sanchez sanchy-ibm Follow ibm-granite Today we're excited to announce Granite 4.0 3B Vision, a compact vision-language model (VLM) designed for enterprise document understanding. It’s purpose-built for reliable information extraction from complex documents, forms, and structured visuals. Granite 4.0 3B Vision excels on the following capabilities: Table Extraction: Accurately parsing complex table structures (e.g., multi-row, multi-column, etc.) from document images Chart Understanding: Converting charts and figures into structured machine-readable formats, summaries, or executable code Semantic Key-Value Pair (KVP) Extraction: Identifying and grounding semantically meaningful key-value field pairs across diverse document layouts The model ships as a LoRA adapter on top of Granite 4.0 Micro, our dense language model, keeping vision and language modular for text-only fallbacks and seamless integration into mixed pipelines. It continues to support vision-language tasks such as producing detailed natural-language descriptions from images (e.g., “Describe this image in detail”). The model can be used standalone or i...