Google DeepMind Releases Gemma 3 — 27B Open-Weights Model Tops Llama 4 Scout on Key Benchmarks
Google DeepMind released Gemma 3, a family of open-weights models up to 27B parameters, with the flagship outperforming Meta's Llama 4 Scout on several benchmarks while being significantly smaller.
Original sourceGoogle DeepMind has released Gemma 3, the latest generation of its open-weights model family. The flagship Gemma 3 27B model sets a new standard for what's achievable at sub-30B parameter scale, outperforming Meta's Llama 4 Scout on MMLU, HumanEval, and MATH benchmarks while using substantially fewer parameters and being runnable on a single high-end consumer GPU.
The Gemma 3 family spans 1B, 4B, 12B, and 27B parameter sizes, with the smaller models specifically optimized for on-device and mobile deployment. The 4B model runs comfortably on modern smartphone hardware at acceptable latency for interactive use cases. All models are released under Google's Gemma license, which permits commercial use with attribution.
Key architectural improvements include an extended 128K context window across the full model family (previous Gemma versions topped out at 8K for base models), improved instruction following, and substantially better multilingual performance across 35+ languages. Google has also released fine-tuned instruction and safety variants alongside the base weights.
The timing is notable: Gemma 3 arrives just weeks after Meta's Llama 4 Scout generated significant excitement as an open-weights frontier model. By shipping a smaller model that beats Scout on several benchmarks, Google is directly challenging the narrative that Meta owns the open-weights space. For the open-source AI community, more competition at the frontier of open-weights models means faster progress and more deployment options.
Gemma 3 models are available immediately on Hugging Face, Google AI Studio, and Vertex AI, with Ollama and LM Studio support expected within hours of release.
Panel Takes
The Builder
Developer Perspective
“27B that beats Scout is significant for self-hosted deployments — this now fits on a single A100 or two RTX 4090s while matching or exceeding Llama 4 Scout on the benchmarks I care about. The 128K context across the whole family is the feature I've been waiting for since Gemma 2.”
The Skeptic
Reality Check
“Benchmark performance rarely translates linearly to real-world task completion. Gemma's previous generations had notable instruction-following gaps that didn't show up in aggregate numbers. I'll wait for the community to put Gemma 3 through its paces on actual production workloads before crowning it the Llama 4 Scout killer.”
The Futurist
Big Picture
“The open-weights frontier is advancing faster than anyone expected 18 months ago. A 27B model competitive with much larger systems is an inflection point for edge AI: devices that can run frontier-class intelligence locally, without cloud dependence. Gemma 3's on-device focus positions Google well for the coming wave of AI-native mobile applications.”