Z.ai's GLM-5.1 Claims #1 on SWE-Bench Pro — The First Open Model to Beat GPT-5 on Real Software Engineering

Z.ai (formerly Zhipu AI) has released GLM-5.1, a 754B-parameter Mixture-of-Experts model under MIT license that claims the #1 position on SWE-Bench Pro with a score of 58.4 — outperforming GPT-5.4 and Claude Opus 4.6. The full weights are available on HuggingFace.

Original source

Z.ai, the Beijing-based lab formerly known as Zhipu AI, has released GLM-5.1 — a 754B-parameter Mixture-of-Experts large language model that claims the top position on SWE-Bench Pro, the harder variant of the industry-standard software engineering benchmark. The model scores 58.4, reportedly outperforming OpenAI's GPT-5.4 and Anthropic's Claude Opus 4.6.

The full weights are available on HuggingFace under the MIT license, making GLM-5.1 the highest-performing open model for software engineering tasks by a significant margin. Z.ai designed the model specifically for long-horizon agentic coding: multi-file reasoning, autonomous test-run-fix loops, and extended sessions spanning hundreds of tool calls.

SWE-Bench Pro is a more rigorous benchmark than the standard SWE-Bench Verified — it tests whether models can resolve real GitHub issues with correct tests, proper diffs, and no regressions across a broader set of repositories. Gaming SWE-Bench Pro through benchmark contamination is harder, making the result more credible than many model announcement claims.

The geopolitical subtext here is significant. A Chinese lab releasing the most capable open coding model under MIT license creates immediate strategic complications for US export control policy, which has focused on restricting chip exports but has no mechanism for restricting model weights that are already public. The model also represents a substantial advance in what's available without going through proprietary APIs, with direct implications for companies trying to keep sensitive codebases on-premises.

The practical constraint is scale: 754B parameters requires either a multi-GPU server or substantial cloud compute. Z.ai offers API access through their platform, but the self-hosting story demands serious infrastructure.

Panel Takes

The Builder

Developer Perspective

“If this benchmarks out independently, it's the first time an open model has been the right answer for a coding agent deployment — not just close, but actually better. MIT license plus #1 SWE-Bench Pro would change how I architect production coding pipelines.”

The Skeptic

Reality Check

“Z.ai's self-reported benchmarks require independent validation. Zhipu AI has claimed top-of-leaderboard results before that didn't survive third-party testing. The 754B scale also means most developers access this through Z.ai's API anyway — at which point the MIT license is largely theoretical.”

The Futurist

Big Picture

“Open weights beating proprietary frontier models on software engineering is the moment the AI industry has been working toward for three years. Whether it's GLM-5.1 or its successor, this is the beginning of the end for API-only coding agent deployments. The open-source future arrived.”

Panel Takes

Bookmarks