GLM-5.1 Becomes First Open-Source Model to Lead SWE-Bench Pro — Trained on 100K Huawei Chips, Zero Nvidia

Z.ai (formerly Zhipu AI) released GLM-5.1 on April 7, 2026 — a 744B open-weight MoE model that scored 58.4% on SWE-bench Pro, making it the first open-source model ever to claim the top spot. Trained entirely on Huawei Ascend 910B chips without a single Nvidia GPU, it's a milestone for both open-source AI and chip independence.

Original source

On April 7, 2026, Z.ai (the company formerly known as Zhipu AI) released GLM-5.1 — a 744-billion-parameter open-weight model that became the first open-source system ever to lead SWE-bench Pro. With a score of 58.4%, it edged past Claude Opus 4.6 (57.3%) and GPT-5.4 (57.7%) on the benchmark that has become the most closely watched measure of AI coding agent capability.

The architecture is a Mixture-of-Experts design with 40 billion active parameters per token — meaning the full 744B aren't in play on every forward pass. The model carries a 200,000-token context window and a 131,072-token maximum output length, making it competitive with frontier models on long-horizon tasks. The MIT license means weights can be used commercially without restriction.

The training story is at least as significant as the benchmark result. GLM-5.1 was trained on approximately 100,000 Huawei Ascend 910B chips using the MindSpore deep learning framework — without any Nvidia hardware at any stage. This directly challenges the CUDA monoculture thesis that has dominated AI infrastructure thinking and policy for years. If GLM-5.1's approach is reproducible and scalable, it is a meaningful demonstration that frontier-tier training is not intrinsically Nvidia-dependent.

Z.ai is publicly traded — the company completed a Hong Kong IPO in January 2026, raising approximately HKD 4.35 billion (~$558M USD), making it the first publicly listed pure-play foundation model company in the world. The model is available as a free download from HuggingFace and via API at $0.95 per million input tokens. In agentic demonstrations, GLM-5.1 has run autonomously for eight continuous hours — executing 655 iterations of planning, execution, testing, and optimization — without a human checkpoint.

Caveats are worth noting: on the broader coding composite benchmark (which combines Terminal-Bench 2.0 and NL2Repo alongside SWE-bench), Claude Opus 4.6 retains the lead at 57.5 versus GLM-5.1's 54.9. The SWE-bench Pro headline is accurate but context-dependent. And for organizations with data sovereignty or compliance requirements tied to model provenance, a Chinese-jurisdiction API presents real questions that go beyond benchmarks.

Panel Takes

The Builder

Developer Perspective

“MIT license, top SWE-bench Pro score, $0.95/M API pricing — this is a serious production option for code agent workloads. The 8-hour autonomous run story is the detail that makes long-horizon task builders pay attention. Evaluate it seriously before defaulting to a more expensive incumbent.”

The Skeptic

Reality Check

“The broader coding composite still has Claude Opus 4.6 ahead, so 'top SWE-bench Pro' is somewhat cherry-picked. The Huawei chip training claim needs independent verification before it drives infrastructure decisions. And Chinese jurisdiction API access will trigger compliance blockers for many regulated organizations, full stop.”

The Futurist

Big Picture

“Training a frontier model on Huawei hardware without Nvidia is geopolitically consequential. If this is reproducible at scale, it directly challenges the GPU-chokepoint narrative that has been shaping export controls, supply chain strategy, and national AI policy for years. This isn't just a benchmark story — it's an infrastructure independence proof of concept.”

Panel Takes

Bookmarks