Gemini 2.5 Ultra Arrives with 2M Token Context Window

Google DeepMind has launched Gemini 2.5 Ultra, its most capable model yet, featuring a 2 million token context window alongside improved reasoning and multimodal capabilities. The model is available now to Gemini Advanced subscribers and through the Google AI Studio API.

Original source

Google DeepMind today announced Gemini 2.5 Ultra, the newest flagship in its Gemini model family. The headline feature is a 2 million token context window — double what most frontier competitors currently offer at scale — which DeepMind says enables the model to process entire codebases, lengthy legal documents, or hours of video in a single pass without retrieval augmentation.

Beyond raw context length, Google is claiming meaningful improvements in reasoning benchmarks and multimodal understanding, including stronger performance on code generation, math, and long-document synthesis tasks. The model reportedly builds on the architecture of Gemini 2.5 Pro but with additional compute at inference time and refinements to instruction-following behavior. No independent methodology for the benchmark claims has been published alongside the announcement.

Access is rolling out in two tracks: consumer-facing availability through Gemini Advanced subscriptions and API access via Google AI Studio for developers, with Vertex AI integration expected to follow. Pricing details for API usage at the 2M token tier have not been fully disclosed, which is a notable gap given that context window costs scale with input length and will determine whether the feature is practically usable for most workloads.

The 2M token context window positions Gemini 2.5 Ultra against models like Claude 3.7 and GPT-4.1, both of which have made long-context capability a competitive battleground in 2026. Whether the practical recall and reasoning quality across that full 2M window holds up under real-world conditions — rather than synthetic benchmarks — remains the open question that enterprise buyers will need to stress-test before committing.

Panel Takes

The Builder

Developer Perspective

“The primitive here is straightforward: a long-context transformer available via a REST API, and the DX bet Google is making is that 2M tokens is a retrieval replacement — skip the vector DB, just chuck the whole codebase in. That's a compelling primitive if the pricing at high token counts doesn't make it prohibitive, which they've conspicuously not disclosed. I'll reserve judgment until I see what a 500K-token API call actually costs and whether the rate limits make it usable for anything beyond toy demos.”

The Skeptic

Reality Check

“Every frontier lab is now claiming the biggest context window and the best reasoning benchmarks, all measured by themselves — this announcement is no different, and the absence of published methodology on those benchmark improvements is a yellow flag, not a minor caveat. The real test is recall quality at 1.5M tokens in, not at position 50K where every model looks good, and that number isn't in this announcement. What kills this in 12 months isn't a competitor — it's pricing: if 2M-token calls cost more than the retrieval pipeline they're supposed to replace, the feature is a demo, not a product.”

The Futurist

Big Picture

“The thesis baked into a 2M token window isn't just 'bigger context' — it's that retrieval-augmented generation is a transitional architecture, and that by 2027 well-resourced teams will trade query-time retrieval complexity for brute-force in-context loading, shifting the hard problem from indexing to cost-per-token. That thesis depends on inference costs continuing to fall faster than context window demand grows, which is plausible but not guaranteed. The second-order effect nobody is talking about: if the whole document fits in context, the power shifts from whoever controls the index back to whoever controls the raw data — a quiet win for enterprises that have hoarded unstructured content and a quiet loss for the RAG tooling ecosystem built up over the last two years.”

The Founder

Business & Market

“The buyer here is the enterprise AI team that currently pays for a RAG pipeline — embeddings, vector DB licenses, retrieval tuning — and Google is betting they'll trade that complexity for a higher API bill with less operational overhead. That's a real wedge, but the missing pricing for the 2M token tier is doing a lot of work in this announcement: the math only works for the buyer if a single long-context call is cheaper than the full retrieval stack it replaces, and without that number, nobody can run the ROI calculation. Google's distribution moat through Workspace and Vertex is real, but this feature needs transparent pricing to close deals, not a press release.”

Panel Takes

Bookmarks