G

GLM-5.1

#1 on SWE-Bench Pro — Zhipu's open 754B MoE beats GPT-5 on coding

PriceOpen Source / MITReviewed2026-04-12

Expert verdict

Skip

2-2
2 Ships2 Skips
Visit github.com

The Panel's Take

Z.ai (formerly Zhipu AI) has released GLM-5.1, a 754B-parameter Mixture-of-Experts model that's currently sitting at #1 on SWE-Bench Pro with a score of 58.4 — outperforming GPT-5.4 and Claude Opus 4.6 on long-horizon software engineering tasks. The model ships under MIT license with full weights on HuggingFace. GLM-5.1 was specifically designed for agentic software engineering workflows: multi-file reasoning, autonomous test-run-fix loops, and extended coding sessions that span hundreds of tool calls. It's not just a capability leap — at 754B active parameters via sparse MoE, it can be run more efficiently than a dense model of equivalent capability on a sufficiently provisioned cluster. The SWE-Bench Pro result is significant because that benchmark is harder to game than vanilla SWE-Bench Verified. It tests whether a model can resolve real GitHub issues with correct tests, proper diffs, and no regressions — the things that actually matter in production. For anyone running self-hosted coding agents or building on open models, GLM-5.1 just became the new baseline to beat.

Share this verdict

GLM-5.1 verdict: SKIP ⏭️

2 ships · 2 skips from the expert panel

Full review: shiporskip.io/tool/glm-5-1-zai-zhipu-754b-moe-swe-bench-pro-coding-2026

Weekly AI Tool Verdicts

Get the next verdict in your inbox

7 critics review a new AI tool every day. Weekly digest — free.

Looking for GLM-5.1 alternatives?

Compare GLM-5.1 with every other AI Models tool reviewed by our panel.

See all AI Models alternatives

Embed this verdict

Tool makers can add a live ShipOrSkip badge to their site. Badge loads track impressions; clicks route back to this review.

Skip · 5.0/10
HTML badge
<a href="https://shiporskip.io/api/badge-click/glm-5-1-zai-zhipu-754b-moe-swe-bench-pro-coding-2026" target="_blank" rel="noopener"><img src="https://shiporskip.io/api/badge/glm-5-1-zai-zhipu-754b-moe-swe-bench-pro-coding-2026" alt="GLM-5.1 Skip verdict on ShipOrSkip" width="360" height="90" /></a>
Markdown badge
[![GLM-5.1 Skip verdict on ShipOrSkip](https://shiporskip.io/api/badge/glm-5-1-zai-zhipu-754b-moe-swe-bench-pro-coding-2026)](https://shiporskip.io/api/badge-click/glm-5-1-zai-zhipu-754b-moe-swe-bench-pro-coding-2026)
Iframe widget
<iframe src="https://shiporskip.io/embed/glm-5-1-zai-zhipu-754b-moe-swe-bench-pro-coding-2026" title="GLM-5.1 ShipOrSkip verdict" width="360" height="260" style="border:0;border-radius:16px;max-width:100%;" loading="lazy"></iframe>

The reviews

If the SWE-Bench Pro numbers hold up under independent replication, this is the first open model that can genuinely replace a proprietary API for serious agentic coding work. MIT license means you can fine-tune and deploy on your own infra. This is a big deal.

Helpful?

754B parameters is not something 99% of developers can run locally. You need a multi-GPU cluster or serious cloud spend. The benchmark numbers are from Z.ai's own evaluations, and Zhipu has a history of optimistic benchmarking. Wait for independent replications.

Helpful?

A Chinese lab shipping an MIT-licensed model that tops global coding benchmarks is a watershed moment for open-source AI. The geopolitical implications are real — this is the model that makes US export controls look strategically shortsighted.

Helpful?

Unless you're building coding tools or agent infrastructure, a 754B MoE model doesn't move the needle for creative applications. The energy and infra overhead for creative use cases doesn't pencil out versus smaller, cheaper models.

Helpful?

Bookmarks

Loading bookmarks...

No bookmarks yet

Bookmark tools to save them for later