Codestral 2
Mistral's 22B Apache 2.0 code model beats GPT-4o on HumanEval
The Panel's Take
Codestral 2 is Mistral AI's second-generation code-specialized model, released under the Apache 2.0 license with 22 billion parameters. It ships with native fill-in-the-middle (FIM) support, context up to 256K tokens, and benchmarks that outperform GPT-4o on both HumanEval and MBPP according to Mistral's internal evals — a significant claim for an open-weight model. The model is designed for three primary use cases: inline code completion (with FIM), multi-file code generation with long context, and agentic coding tasks where the model needs to reason about large codebases. Mistral has also optimized it specifically for the most popular languages of 2026: Python, TypeScript, Go, Rust, and SQL. Integration support covers Cursor, Continue.dev, VS Code, and direct API access via the Mistral API and HuggingFace. For the open-source community, Codestral 2 arrives at the right moment. The local LLM coding space has been dominated by Qwen3-Coder variants, and Codestral 2 offers a Western-lab alternative with a permissive license, strong fill-in-the-middle performance, and a model size that fits comfortably on a single A100 or dual consumer GPUs at Q4 quantization.
Share this verdict
Codestral 2 verdict: SHIP 🚀 3 ships · 1 skip from the expert panel Full review: shiporskip.io/tool/codestral-2-mistral-22b-apache-code-completion-fill-in-middle-2026
Weekly AI Tool Verdicts
Get the next verdict in your inbox
7 critics review a new AI tool every day. Weekly digest — free.
Compare Codestral 2 with Others
Embed this verdict
Tool makers can add a live ShipOrSkip badge to their site. Badge loads track impressions; clicks route back to this review.
<a href="https://shiporskip.io/api/badge-click/codestral-2-mistral-22b-apache-code-completion-fill-in-middle-2026" target="_blank" rel="noopener"><img src="https://shiporskip.io/api/badge/codestral-2-mistral-22b-apache-code-completion-fill-in-middle-2026" alt="Codestral 2 Ship verdict on ShipOrSkip" width="360" height="90" /></a>[](https://shiporskip.io/api/badge-click/codestral-2-mistral-22b-apache-code-completion-fill-in-middle-2026)<iframe src="https://shiporskip.io/embed/codestral-2-mistral-22b-apache-code-completion-fill-in-middle-2026" title="Codestral 2 ShipOrSkip verdict" width="360" height="260" style="border:0;border-radius:16px;max-width:100%;" loading="lazy"></iframe>The reviews
“Apache 2.0 + fill-in-the-middle + 256K context is the trifecta I've been waiting for in a locally-runnable code model. The HumanEval numbers are believable based on my early testing — it's genuinely competitive with GPT-4o on completion tasks, which is remarkable at this size and license.”
“Mistral's benchmarks are self-reported and the comparison methodology isn't fully disclosed. I'd want independent evaluation before trusting 'beats GPT-4o' claims — especially since Mistral's previous eval comparisons have been questioned. Also, 22B at full precision still requires significant GPU memory that most indie developers don't have.”
“A truly permissive, high-quality code model changes the economics of AI-assisted development for enterprises with data privacy requirements. The real story here isn't beating GPT-4o on benchmarks — it's enabling companies that can't send code to external APIs to finally have a competitive option they can run on-premise.”
“For the growing community of creators building with AI coding tools, having a locally-runnable model with this quality means your code stays on your machine. The Cursor integration makes it plug-and-play, which lowers the barrier to trying it significantly.”