Google released Gemma 3 on March 12, 2026, a family of open-weight multimodal models in four sizes (1B, 4B, 12B, 27B) built on Gemini 2.0 technology. The models support 128k token context, 140+ languages, and image-plus-text input, with open weights on Hugging Face and Kaggle.
What Happened
Google DeepMind released Gemma 3 as its newest open-weight model family, with sizes ranging from 1B (text-only) to 27B (full multimodal). All models above 1B accept image and text input and produce text output. The official announcement describes Gemma 3 as built from the same research and infrastructure that powers Gemini 2.0.
In early benchmark evaluations, Gemma 3 27B outperforms Llama 3-405B, DeepSeek-V3, and o3-mini in human preference evaluations, all at a fraction of the parameter count. The Hugging Face integration post highlights strong performance on coding, reasoning, and instruction-following tasks.
Why It Matters
The 128k context window is the headline for creative AI users. It means you can pass entire scripts, long design briefs, or large code files into a locally-run model and get coherent responses across the full input. Previous open-weight models in this size range typically topped out at 8k-32k context.
The multimodal capability at the 4B size is particularly useful for creative workflows: at 4B parameters, Gemma 3 runs on consumer laptops with 8-16 GB of RAM while still accepting image input. This opens up local visual analysis, image-to-text workflows, and design critique pipelines without cloud dependency.
The Apache-style license permits commercial use, making these models viable for product integrations without royalties or usage restrictions.
Key Details
- Sizes: 1B (text-only), 4B, 12B, 27B (all multimodal except 1B)
- Context window: 128k tokens across all sizes
- Languages: 140+ supported
- Input: text and images (4B, 12B, 27B); text only (1B)
- Output: text
- Based on: Gemini 2.0 research and architecture
- Available on: Hugging Face and Kaggle
- License: Gemma Terms of Use (permits commercial use)
What to Do Next
The 4B instruction-tuned variant is the best starting point for most creative workflows. It runs locally on most modern laptops and handles image analysis, long-document summarization, and multilingual tasks. Download from Hugging Face or pull through Ollama once community builds are available. The DeepMind model page has detailed technical specifications and a live demo.