Back
DeepSeek / Simon WillisonLaunchDeepSeek / Simon Willison2026-04-26

DeepSeek V4 Drops Two MIT Models — 1.6T Parameters, 1M Context, Frontier Performance

DeepSeek released V4-Pro (1.6T parameters, 49B active) and V4-Flash (284B parameters, 13B active) on April 24, both MIT-licensed with 1M token context — and V4-Pro now tops every open-source model on math and coding benchmarks.

Original source

## DeepSeek Does It Again

One year after DeepSeek V2 rattled the AI industry with near-frontier performance at a fraction of the inference cost, the Chinese lab has done it again. DeepSeek V4 dropped April 24, 2026 as two production-ready Mixture-of-Experts models: V4-Pro (1.6 trillion parameters, 49 billion activated per forward pass) and V4-Flash (284 billion parameters, 13 billion activated). Both ship under the MIT license and support 1 million token context windows.

## The Efficiency Story

The headline capability is remarkable, but the architecture is the real news. DeepSeek's hybrid attention mechanism — Compressed Sparse Attention (CSA) combined with Heavily Compressed Attention (HCA) — dramatically cuts the cost of long-context inference. At 1M tokens, V4-Pro requires just 27% of the single-token FLOPs and 10% of the KV cache compared to DeepSeek V3.2. That's not an incremental improvement; it's a step-change in what long-context inference costs to serve.

## Where It Stands

On coding and math benchmarks, DeepSeek V4-Pro beats every available open-source model. The only model it trails is Google's Gemini 3.1-Pro on world knowledge — and Gemini 3.1-Pro is closed-source and several times the cost. For world knowledge tasks, V4-Flash still performs competitively while costing a fraction of V4-Pro to serve.

## What the MIT License Actually Means

The MIT license is the most permissive open-source option available — unrestricted commercial use, redistribution, and modification. Combined with Hugging Face hosting, this means any team with the hardware can run V4-Flash locally with no legal friction. The V4-Flash at 13B active parameters is already running on high-end workstations with quantization.

## The China Factor

DeepSeek is a Hangzhou-based lab that has now produced three successive generations of models that have materially moved the frontier/cost ratio. The geopolitical dimension is becoming impossible to ignore: Chinese open-source AI is compressing the advantage Western frontier labs spent billions to build, and doing it with published architectures under permissive licenses.

Panel Takes

The Builder

The Builder

Developer Perspective

The 27% FLOPs reduction at 1M context is the stat I'll be citing for the rest of the year. That's the number that makes million-token context an everyday tool rather than an expensive experiment. V4-Flash is going straight into my stack.

The Skeptic

The Skeptic

Reality Check

The benchmark lead is real but narrow, and DeepSeek's API has had availability issues during previous launches. The geopolitical risk of a China-based model provider is a genuine compliance concern for enterprise teams — MIT license doesn't solve data residency.

The Futurist

The Futurist

Big Picture

DeepSeek V4 is evidence that the frontier is no longer a walled garden. When a lab outside the Silicon Valley bubble can ship MIT-licensed models that match GPT-5.5 on coding, the entire pricing structure of AI APIs becomes negotiable.

Bookmarks

Loading bookmarks...

No bookmarks yet

Bookmark tools to save them for later