G

GLM-5.1

#1 on SWE-Bench Pro — 744B MoE model that runs autonomously for 8 hours

PriceAPI (pricing TBD)Reviewed2026-04-07

Expert verdict

Skip

2-2
2 Ships2 Skips
Visit z.ai

The Panel's Take

GLM-5.1 is Z.AI's post-training upgrade of the 744B Mixture-of-Experts GLM-5 model, and it has just claimed the top spot on SWE-Bench Pro with a score of 58.4 — beating GPT-5.4 (57.7), Claude Opus 4.6 (57.3), and Gemini 3.1 Pro (54.2). The model is designed for long-horizon agentic tasks and can run autonomously for up to 8 hours across thousands of iterations on a single problem. The agentic capabilities include extended context retention, tool-calling with recovery loops, and a reinforcement-trained "persistence" mode that keeps the model on-task through failures and dead ends rather than surfacing errors to the user. The model was trained entirely on Huawei Ascend 910B chips using the MindSpore framework — no US silicon, no CUDA. The geopolitical dimension is as significant as the technical one: GLM-5.1 is direct evidence that US export controls on Nvidia hardware have not meaningfully slowed China's frontier model development. The 8-hour autonomous execution window is also a step-change from current agentic systems that struggle past 20-30 minutes of coherent work — if this benchmark holds up in real-world testing, it's a genuine advancement in the class of problems AI agents can independently solve.

Share this verdict

GLM-5.1 verdict: SKIP ⏭️

2 ships · 2 skips from the expert panel

Full review: shiporskip.io/tool/glm-51-zai-swe-bench-pro-1-744b-moe-8hour-autonomous-huawei-chips

Weekly AI Tool Verdicts

Get the next verdict in your inbox

7 critics review a new AI tool every day. Weekly digest — free.

Looking for GLM-5.1 alternatives?

Compare GLM-5.1 with every other AI Models tool reviewed by our panel.

See all AI Models alternatives

Embed this verdict

Tool makers can add a live ShipOrSkip badge to their site. Badge loads track impressions; clicks route back to this review.

Skip · 5.0/10
HTML badge
<a href="https://shiporskip.io/api/badge-click/glm-51-zai-swe-bench-pro-1-744b-moe-8hour-autonomous-huawei-chips" target="_blank" rel="noopener"><img src="https://shiporskip.io/api/badge/glm-51-zai-swe-bench-pro-1-744b-moe-8hour-autonomous-huawei-chips" alt="GLM-5.1 Skip verdict on ShipOrSkip" width="360" height="90" /></a>
Markdown badge
[![GLM-5.1 Skip verdict on ShipOrSkip](https://shiporskip.io/api/badge/glm-51-zai-swe-bench-pro-1-744b-moe-8hour-autonomous-huawei-chips)](https://shiporskip.io/api/badge-click/glm-51-zai-swe-bench-pro-1-744b-moe-8hour-autonomous-huawei-chips)
Iframe widget
<iframe src="https://shiporskip.io/embed/glm-51-zai-swe-bench-pro-1-744b-moe-8hour-autonomous-huawei-chips" title="GLM-5.1 ShipOrSkip verdict" width="360" height="260" style="border:0;border-radius:16px;max-width:100%;" loading="lazy"></iframe>

The reviews

If the 8-hour autonomous execution claim is real and not cherry-picked, this changes the calculus for using AI on genuinely hard engineering problems. SWE-Bench Pro #1 is also a credible metric — I want to test this on my own repos immediately.

Helpful?

SWE-Bench benchmarks have historically shown poor correlation with real-world coding productivity, and the '8-hour autonomous' claim needs independent validation. Z.AI is also a relatively unknown quantity compared to Anthropic or Google — API reliability and pricing are completely unproven.

Helpful?

The strategic significance of a Chinese lab hitting #1 on the coding benchmark using zero US hardware cannot be overstated. The export control strategy is officially not working as intended, and GLM-5.1 will accelerate the geopolitical AI arms race in ways that reshape the entire industry.

Helpful?

For creative work, I need a model with strong multimodal capabilities and reliable API access — both unproven for GLM-5.1. The coding benchmark lead is impressive but not directly relevant to my workflows. I'll wait for independent reviews before switching.

Helpful?

Bookmarks

Loading bookmarks...

No bookmarks yet

Bookmark tools to save them for later