Gemini 2.5 Ultra Arrives with 2M Token Context and Multimodal Reasoning
Google DeepMind launched Gemini 2.5 Ultra with a 2-million-token context window and improved multimodal reasoning across text, code, audio, and video. The model is available now through the Gemini API and Google AI Studio.
Original sourceGoogle DeepMind released Gemini 2.5 Ultra today, marking a significant expansion in both context capacity and cross-modal reasoning. The model supports a 2-million-token context window — enough to hold roughly 1,500 pages of dense text, several hours of audio, or long video sequences in a single inference call. Google is positioning this as a step-change for tasks that require sustained reasoning over large inputs: legal document analysis, long-form code reviews, multi-hour meeting transcription with structured output, and video-grounded Q&A.
The multimodal improvements span text, code, audio, and video, with Google claiming stronger performance on benchmarks requiring joint reasoning across modalities rather than treating each input stream independently. The model is available immediately through the Gemini API and Google AI Studio, giving developers direct access without a waitlist. Pricing details for the 2M context tier have not been fully disclosed at launch, which will be a material factor in adoption for high-throughput use cases.
The 2-million-token ceiling doubles what was previously available in Gemini 1.5 Pro and puts it ahead of most publicly accessible models on raw context length. Competitors including Anthropic's Claude and OpenAI's GPT-4o family have pushed context windows in recent cycles, making this a continued arms race metric. Whether the quality of retrieval and reasoning degrades at the far end of a 2M context call — a known failure mode across all long-context models — remains the practical question developers will need to evaluate against their own workloads.
Panel Takes
The Builder
Developer Perspective
“The primitive here is a long-context multimodal inference endpoint — that's it, stripped down. The right question for DX is whether the API surface for sending mixed-modality inputs at 2M tokens is clean or a configuration nightmare, and Google's track record on this is mixed: AI Studio is genuinely useful for prototyping, but the Gemini API has historically had rough edges around file handling and multimodal payloads. The moment of truth is when you try to send a 90-minute video alongside a code file and a PDF in the same call — if that works with a sane SDK and predictable latency, this earns a ship; if it requires three undocumented workarounds, it doesn't matter what the benchmark says.”
The Skeptic
Reality Check
“The 2M token number is the headline, but the real question is what happens to reasoning quality at token 1,800,000 — every long-context model released in the last two years has shown degradation in the middle and far end of the context window, and Google hasn't published methodology on how they've addressed that here. The direct competitor is Claude 3.7 with its own long-context positioning, and the differentiator will come down to per-token pricing on the high end, which Google hasn't disclosed, which is itself a signal. I'll ship this when the pricing is public and a third-party evals shop runs needle-in-a-haystack tests at full context depth — until then, the 2M number is marketing.”
The Futurist
Big Picture
“The thesis embedded in a 2M token multimodal model is specific and falsifiable: that the bottleneck for enterprise AI workflows is context capacity, not generation quality — that if you can fit the whole document, the whole codebase, the whole meeting recording into one call, a new class of application becomes structurally possible that couldn't exist before. The second-order effect that nobody is talking about is what this does to retrieval-augmented generation as an architecture: if context windows keep growing at this rate, RAG pipelines become engineering debt rather than a best practice, and the tooling ecosystem built around chunking and vector stores faces a serious relevance problem by 2027. Google is on-time to this trend, not early — Anthropic got there first with 200K on Claude 3 — but the jump to 2M on a multimodal model is a meaningful escalation of the bet.”
The Founder
Business & Market
“The buyer for a 2M token multimodal model is unambiguously enterprise — legal, finance, media, and any vertical that drowns in long unstructured documents — and that buyer has a budget line for this, which is the right starting point. The problem is that Google hasn't disclosed pricing for the 2M context tier at launch, and in my experience that means it's expensive enough that they don't want it as the first thing you read; that's a friction point for any startup trying to build on top of this and model unit economics before committing. The moat question is real: Google controls the model, the API, the Studio, and the underlying infrastructure, so any product built on this is essentially a distribution bet on Google not shipping your feature natively inside Workspace — which is a bet I wouldn't take without a strong workflow lock-in story.”