MiniMax's M2.5 Hits 80% on SWE-Bench Verified — The Most Capable Open API You Haven't Heard Of

Chinese AI lab MiniMax quietly shipped M2.5 with 80.2% on SWE-Bench Verified — matching Claude 3.7 Sonnet — while keeping full API access open globally. The lab also released an official CLI today, positioning M2.5 as a serious alternative for agentic coding pipelines.

Original source

MiniMax, the Chinese AI lab best known for its multimodal consumer apps, has been running a quiet frontier lab operation that most Western developers haven't noticed. Their M2.5 model posted 80.2% on SWE-Bench Verified this week — a score that matches or exceeds Claude 3.7 Sonnet on the industry's most-watched coding benchmark — while maintaining open API access at competitive pricing.

The milestone arrives alongside the launch of the official MiniMax CLI today (Product Hunt #8, April 10), which gives agents and terminal workflows native access to M2.5 alongside MiniMax's image, video, speech, music, and web-search model stack through a unified command surface. The CLI is designed with agent-first ergonomics: clean stdout output, semantic exit codes, and async job queuing for long-running tasks.

What makes the MiniMax story interesting is the combination of full-stack multimodal capabilities with frontier coding performance at below-frontier prices. While Anthropic, OpenAI, and Google have raised enterprise API prices to match their brand position, MiniMax has been building a comprehensive API platform that competes on capability per dollar. Their T2A (text-to-audio) and video generation models have already found significant adoption in content pipelines.

The broader context: China's AI labs have been systematically closing the gap with Western frontier models, often releasing capable models at lower prices to build API market share. MiniMax's approach — quiet capability shipping rather than splashy announcements — means many developers are discovering them for the first time through benchmark leaderboards rather than PR campaigns.

For developers building agentic coding pipelines who are concerned about Anthropic's pricing, M2.5 at 80% SWE-Bench is a serious alternative that merits evaluation. The new CLI makes the integration path straightforward.

Panel Takes

The Builder

Developer Perspective

“80% SWE-Bench with an open API and a purpose-built CLI is the combination I've been waiting for from a non-Anthropic provider. If the rate limits and latency hold up under production load, M2.5 deserves a serious look for cost-sensitive agentic pipelines.”

The Skeptic

Reality Check

“Chinese AI lab benchmarks should be read carefully — evaluation contamination and cherry-picked task distributions are real concerns. The 'open API' positioning also raises data residency and compliance questions for enterprise users in regulated industries.”

The Futurist

Big Picture

“MiniMax shipping frontier coding performance quietly while Western labs announce loudly is a pattern that's accelerating. The AI API market is fragmenting by use case and price point, and developers who only watch Anthropic/OpenAI/Google are going to miss where the real capability-per-dollar action is.”

Panel Takes

Bookmarks