Zhipu's GLM-5.1 Tops SWE-Bench Pro With MIT License — And Not a Single Nvidia GPU

Zhipu AI released GLM-5.1, a 754B-parameter MoE model (40B active per token) that scores 58.4 on SWE-Bench Pro — beating GPT-5.4 and Claude Opus 4.6. It's MIT licensed, trained entirely on Huawei Ascend 910B chips, and can autonomously run a plan→execute→test→fix→optimize loop for up to 8 hours.

Original source

Zhipu AI's GLM-5.1, released April 7, 2026, has done something that seemed unlikely six months ago: topped the SWE-Bench Pro leaderboard with an open-weight model carrying an MIT license. The 754-billion parameter Mixture of Experts model activates only 40 billion parameters per inference step, making it deployable on high-memory server hardware without full model materialization.

The SWE-Bench Pro score of 58.4 edges out GPT-5.4 (57.7) and Claude Opus 4.6 (57.3) — the two reigning benchmarks in AI-assisted software engineering. For context, SWE-Bench Pro tests models on real GitHub issues from production codebases, requiring the model to navigate unfamiliar code, reproduce bugs, and submit working patches. It's considered the hardest real-world coding benchmark currently in use.

What makes the release more remarkable is the hardware story: GLM-5.1 was trained entirely on Huawei Ascend 910B chips, with zero Nvidia involvement. This is the first frontier-class model trained outside the Nvidia ecosystem at this scale, and it directly validates China's domestic AI chip strategy in the face of US export restrictions. Zhipu explicitly noted the Ascend-only stack in their technical report.

The model ships with an 8-hour autonomous agent mode: given a task, it will autonomously plan, implement, run tests, diagnose failures, fix them, and optimize — cycling without human intervention until time runs out or a confidence threshold is met. Early evals show it resolves 4-6 engineering tasks per 8-hour run that would typically require a mid-level developer working a full day.

The MIT license means GLM-5.1 is immediately available for commercial use, fine-tuning, and redistribution without royalties. Weights are hosted on Hugging Face. The LocalLLaMA community has already begun quantization efforts, with early Q4_K_M quants running on 4x A100 setups at ~12 tokens/sec.

Panel Takes

The Builder

Developer Perspective

“MIT license plus SWE-Bench #1 is the combination I've been waiting for. I can fine-tune this on my company's codebase, run it on our own infra, and not owe anyone a penny. The Ascend training story is interesting but what I care about is: does it close my GitHub issues? Apparently yes.”

The Skeptic

Reality Check

“SWE-Bench is a benchmark, and benchmarks get gamed. Zhipu has strong incentive to optimize for SWE-Bench specifically, and a 0.7-point margin is within overfitting range. The 8-hour autonomous loop is also untested at scale — I'd want to see it on diverse production codebases before trusting it with anything important.”

The Futurist

Big Picture

“The Ascend-only training stack is the story that will matter in five years. China just proved it can train a frontier model without Nvidia, without CUDA, without the US supply chain. This isn't just a model release — it's a geopolitical data point about the durability of chip export controls as an AI containment strategy.”

Panel Takes

Bookmarks