Google DeepMind Releases Gemini 2.5 Ultra in Limited Preview
Google DeepMind has released a limited preview of Gemini 2.5 Ultra, claiming top benchmark scores on MMLU-Pro and LiveCodeBench. The model is accessible through Google AI Studio and Vertex AI for early testers.
Original sourceGoogle DeepMind announced a preview release of Gemini 2.5 Ultra, the latest and largest model in the Gemini 2.5 family. The company claims the model achieves state-of-the-art scores on MMLU-Pro, a multitask language understanding benchmark, and LiveCodeBench, which evaluates models on real-world coding problems sourced after common training cutoff dates. The preview is available to select developers through Google AI Studio and enterprise customers via Vertex AI.
Gemini 2.5 Ultra follows the earlier release of Gemini 2.5 Pro, which had already positioned itself as a strong competitor in the reasoning and coding benchmark landscape. The Ultra variant is positioned as the highest-capability offering in the lineup, though Google has not yet published full pricing for production use or disclosed the model's context window specifics beyond what was available in the Pro tier.
The limited preview framing means most developers won't have immediate hands-on access, and the benchmark claims have not yet been independently reproduced by third parties. Google is making the model available through its existing developer infrastructure, which means API access patterns should be familiar to anyone already using Gemini 2.5 Pro — the same SDKs and Vertex AI tooling apply. Full availability and production pricing have not been announced.
Panel Takes
The Builder
Developer Perspective
“The primitive here is a higher-capacity inference endpoint sitting behind the same Gemini API surface — if you're already on the 2.5 Pro SDK, the migration path should be a model string swap, which is the right call. What I actually want to know before getting excited: does the context window increase meaningfully over Pro, and does LiveCodeBench performance hold up on codebases longer than a single file? 'State-of-the-art on LiveCodeBench' is a real benchmark with real code problems, so I'll take that seriously — but I'm not touching the preview until there's public pricing and the limited-access gate is down.”
The Skeptic
Reality Check
“MMLU-Pro and LiveCodeBench are legitimate benchmarks, but Google is both running the model and announcing the results — independent replication doesn't exist yet, and 'limited preview' means the public can't verify anything. The direct competitor here is GPT-4.5 and Claude Opus 4, both of which are already in full production with published pricing, which means Google is asking developers to plan around a model they can't reliably access yet. Prediction: this wins on coding tasks if the LiveCodeBench numbers hold up in the wild, but the 12-month threat is Google commoditizing it into Gemini Pro pricing and making Ultra irrelevant as a separate tier.”
The Futurist
Big Picture
“The thesis Google is betting on here is falsifiable: frontier model capability will remain differentiated enough at the top tier to justify a separate Ultra SKU, and enterprises will pay a premium for it through Vertex AI rather than routing to cheaper alternatives. The second-order effect worth watching is what Ultra-grade coding performance does to the Vertex AI platform stickiness story — if developers ship production agents on Vertex using 2.5 Ultra, the switching cost isn't the model, it's the infrastructure. Google is late to the 'best coding model' positioning relative to Anthropic's run with Claude 3.5 Sonnet, but they're riding the trend that enterprise cloud contracts make model distribution a platform game, not a model quality game.”
The Founder
Business & Market
“The buyer here is clearly the enterprise architect who already has a Google Cloud contract — Vertex AI placement means this goes into existing procurement cycles, not a new budget line, which is genuinely smart distribution. The moat isn't the model itself; it's that switching off Vertex for an enterprise with existing GCP commitments has a real organizational cost. The risk is the same as always with Google's AI products: the limited preview phase drags on, competitors publish pricing first, and the developer community picks a different default before Ultra even ships to GA.”