Meta Released Llama 4 — Open-Weight Multimodal MoE Models With a 10 Million Token Context Window
Meta released Llama 4 Scout and Maverick on April 5 — the first open-weight natively multimodal Mixture-of-Experts models. Scout runs on a single H100 with a 10M context window; Maverick benchmarks against GPT-4o on reasoning and coding.
Original sourceMeta dropped what may be the most significant open-source AI release of 2026: Llama 4, a family of natively multimodal Mixture-of-Experts models available for free download.
**Scout and Maverick: Different tools for different jobs**
Llama 4 Scout is the efficiency pick — 17 billion active parameters, 16 experts, fits on a single NVIDIA H100 GPU. Its headline feature is a 10 million token context window, the largest available in any open-weight model. Scout was trained on 40 trillion tokens of data and delivers performance comparable to models 5-10x its size on most benchmarks.
Llama 4 Maverick is the capability pick — also 17B active parameters but with 128 experts, producing a much larger total parameter count. Maverick benchmarks comparably to GPT-4o and DeepSeek v3 on reasoning and coding tasks, with a 1 million token context window.
Both models are natively multimodal: they process text, images, and video inputs out of the box, without the stitched-together multimodal adapters that earlier open models relied on.
**Why this changes things**
Before Llama 4, organizations that needed multimodal AI at scale had essentially two choices: pay for GPT-4o, Claude, or Gemini, or accept meaningfully lower quality from open alternatives. Llama 4 closes that gap substantially.
For researchers, non-profits, and organizations in countries that can't or won't depend on US cloud providers for sovereign AI capability, Llama 4 is a genuine strategic unlock. The 10M context window on Scout is unprecedented for open weights and enables use cases — full codebase analysis, entire legal document review, long-form scientific literature synthesis — that were previously gated behind expensive proprietary APIs.
**Caveats and limits**
Llama 4 operates under Meta's Community License, which has more restrictions than a true open-source license (particularly for companies with large user bases). And while the benchmarks are impressive, frontier models from Anthropic and OpenAI still lead on the most complex reasoning tasks. But for 80% of production workloads, the economics are now decisively in favor of Llama 4.
Panel Takes
The Builder
Developer Perspective
“Llama 4 Scout on a single H100 with 10M context is the production deployment story I've been waiting for. We're migrating three internal tools from GPT-4o this week — the cost savings alone justify it, and the multimodal capability is genuinely on par for our use cases.”
The Skeptic
Reality Check
“Meta's Community License isn't OSI-approved open source — it's open weights with strings attached. Companies with large user bases need to read the fine print carefully. And benchmark performance doesn't always translate to production reliability. Run your actual workloads before migrating.”
The Futurist
Big Picture
“Llama 4 continues Meta's extraordinary strategy of giving away foundational AI infrastructure to commoditize the competition. Every time a model this capable becomes freely available, it shifts power from model providers to application builders — and that's exactly the world Meta profits from.”