Welcome Gemma 4: Frontier multimodal intelligence on device
About this article
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Back to Articles Welcome Gemma 4: Frontier multimodal intelligence on device Published April 2, 2026 Update on GitHub Upvote 7 +1 merve merve Follow Pedro Cuenca pcuenq Follow Sergio Paniego sergiopaniego Follow ben burtenshaw burtenshaw Follow Steven Zheng Steveeeeeeen Follow Alvaro Bartolome alvarobartt Follow The Gemma 4 family of multimodal models by Google DeepMind is out on Hugging Face, with support for your favorite agents, inference engines, and fine-tuning libraries 🤗 These models are the real deal: truly open with Apache 2 licenses, high quality with pareto frontier arena scores, multimodal including audio, and sizes you can use everywhere including on-device. Gemma 4 builds on advances from previous families and makes them click together. In our tests with pre-release checkpoints we have been impressed by their capabilities, to the extent that we struggled to find good fine-tuning examples because they are so good out of the box. We collaborated with Google and the community to make them available everywhere: transformers, llama.cpp, MLX, WebGPU, Rust; you name it. This blog post will show you how to build with your favorite tools so let us know what you think! Table of Contents What is New with Gemma 4? Overview of Capabilities and Architecture Architecture at a Glance Per-Layer Embeddings (PLE) Shared KV Cache Multimodal Capabilities Deploy Anywhere transformers Llama.cpp Plug in to your local agent transformers.js MLX Mistral.rs Fine-tuning & Demos Fine-tuni...