Gemini 2.5 Ultra Arrives with 'Deep Think' Mode and 2M Token Context

Google DeepMind has released Gemini 2.5 Ultra, its most capable model yet, featuring a new 'deep think' inference mode and a 2 million token context window. The model targets top benchmark rankings in coding, math, and long-context reasoning, and is available to Gemini Advanced subscribers and API users.

Original source

Google DeepMind has officially announced Gemini 2.5 Ultra, the latest and most advanced entry in its Gemini model family. The company claims state-of-the-art performance across a range of demanding benchmarks, with particular emphasis on coding tasks, mathematical reasoning, and long-context comprehension. The model is rolling out first to Gemini Advanced subscribers, with API access also available for developers from launch day.

The headline technical addition is a new 'deep think' inference mode, which allows the model to engage in extended, multi-step reasoning before producing a final response — a pattern now becoming standard among frontier models competing for top benchmark positions. Alongside this, Gemini 2.5 Ultra ships with an expanded 2 million token context window, one of the largest available in a production model, enabling use cases like full codebase analysis, lengthy document synthesis, and complex multi-turn research workflows.

The announcement positions Google directly against OpenAI's o-series models and Anthropic's Claude 3.7 Sonnet in the high-stakes race for reasoning supremacy. While benchmark claims from model providers should always be read carefully — evaluations are often curated to favor the announcing model — the combination of a massive context window and a dedicated thinking mode does represent a meaningful capability profile, particularly for enterprise and developer use cases that demand both depth and breadth.

Access through Gemini Advanced makes the model immediately available to a broad consumer base, while the simultaneous API release signals Google's intent to compete aggressively for developer mindshare. Whether Gemini 2.5 Ultra holds up in real-world, production-grade tasks — beyond curated benchmarks — will be the true test that the developer community is already preparing to run.

Panel Takes

The Builder

Developer Perspective

“A 2M token context window on day one through the API is genuinely useful — I can finally throw an entire monorepo at it without chunking hacks. The 'deep think' mode is interesting, but I need to benchmark its latency cost before I'd consider it for anything user-facing. Cautiously optimistic, but the proof will be in the evals I run this week.”

The Skeptic

Reality Check

“Every major lab announces 'state-of-the-art' on benchmarks they conveniently curate — Gemini 2.5 Ultra is unlikely to be an exception. 'Deep think' mode sounds compelling until you realize it's essentially the same extended compute trick everyone else is shipping under different branding. I'll reserve judgment until independent third-party evaluations come in.”

The Futurist

Big Picture

“A 2 million token context window isn't just a spec bump — it's a quiet architectural shift in how AI systems can hold and navigate knowledge over long horizons. Combined with deeper reasoning, models like this start to look less like smart autocomplete and more like persistent analytical agents. We're watching the infrastructure for AI-native workflows get quietly cemented in real time.”

The Creator

Content & Design

“The multimodal angle is what catches my eye — if Gemini 2.5 Ultra can reason deeply across text, images, and code in a single long context, that opens up genuinely new creative and production workflows. Imagine feeding it an entire brand guideline, a design system, and a content brief and getting coherent output across all three. That's the use case I'm quietly excited to test.”

Panel Takes

Bookmarks