Back
Google DeepMind / HuggingFaceLaunchGoogle DeepMind / HuggingFace2026-04-09

Google Releases Gemma 4 — Four Apache 2.0 Multimodal Models With 256K Context That Punch Way Above Their Weight

Google DeepMind released Gemma 4 on April 2, 2026: four Apache 2.0 open-weight models ranging from 2.3B to 31B parameters, all multimodal (image, video, audio, text), with 128K–256K context windows and native tool-calling. The 31B dense model scores 1452 Elo on LMArena, and the 26B MoE variant achieves near-parity with the 31B at only 4B active parameters.

Original source

Google DeepMind released Gemma 4 on April 2, 2026, and the open-source AI community has been running benchmarks ever since. The release includes four model sizes — E2B (2.3B effective), E4B (4.5B effective), 26B A4B (a mixture-of-experts with only 4B active parameters), and 31B dense — all available under Apache 2.0 with no commercial restrictions.

The headline capability is multimodality. Every Gemma 4 model handles image and text natively; the two smaller models (E2B and E4B) also support audio. Video understanding is available across the family. Combined with 256K context windows on the two larger models and native JSON-mode tool calling, Gemma 4 is designed to serve as the backbone for agentic systems that need to process rich media inputs without cloud API dependencies.

The benchmark numbers are striking for open weights. The 31B model hits 1452 Elo on LMArena — competitive with models twice its size from a year ago. On GPQA Diamond (a hard science reasoning benchmark), the 31B scores 84.3% and the 26B MoE scores 82.3%. The MoE architecture story is particularly compelling: by activating only 4 of 26 billion parameters per token, the 26B model achieves near-31B quality at significantly lower inference cost.

Deployment support is comprehensive from day one: HuggingFace Transformers, llama.cpp, MLX for Apple Silicon, transformers.js for browser/WebGPU, ONNX for edge, and mistral.rs. Fine-tuning is supported via TRL and Vertex AI. This is the first major open-weight multimodal release that runs on consumer hardware (Apple Silicon M-series with MLX) with a 256K context window, which expands the practical use cases considerably for indie builders and researchers without GPU clusters.

For the open-source community, Gemma 4 lands as one of the most capable freely deployable model families ever released. The Apache 2.0 license removes the usage restrictions that limited earlier Gemma versions, and the combination of multimodality, long context, and tool calling puts Gemma 4 in genuine competition with proprietary APIs for a wide range of production workloads.

Panel Takes

The Builder

The Builder

Developer Perspective

This is the release I've been waiting for. Apache 2.0 multimodal with 256K context that runs on Apple Silicon via MLX — that's a production-grade local model for the first time. The 26B MoE hitting near-31B quality at 4B active params is the efficiency story of the quarter. I'm migrating API calls to self-hosted Gemma 4 immediately.

The Skeptic

The Skeptic

Reality Check

The LMArena Elo scores are self-reported and the benchmark selection favors Google's strengths. Real-world coding and instruction-following quality won't be clear until the community has had a few weeks with it. The audio multimodality is also notably limited to the two smaller models, which feels like a deliberate capability hold-back.

The Futurist

The Futurist

Big Picture

Gemma 4 is a statement: Google is committed to open weights at the frontier. Apache 2.0 on a 31B multimodal model with 256K context normalizes a new baseline for what 'open' means in AI. The downstream effect is that every app built on proprietary vision APIs will get pressured to justify the cost premium over a self-hosted Gemma 4.