Back
OpenAIModelOpenAI2026-06-01

GPT-5 Launches with Native Multimodal Reasoning and 1M Token Context

OpenAI has released GPT-5, a natively multimodal model supporting text, images, audio, and video reasoning with a one-million-token context window. It's available today via API and ChatGPT Plus, with enterprise tiers rolling out over the next week.

Original source

OpenAI today released GPT-5, its most capable model to date, with native multimodal reasoning across text, images, audio, and video baked into a single unified architecture — not bolted together from separate specialized models. The headline capability is a one-million-token context window, putting long-document analysis, large codebases, and extended multi-session conversations within reach of a single API call. The model is immediately available to ChatGPT Plus subscribers and through the API, with enterprise access rolling out over the following week.

The shift from modality-specific pipelines to a unified reasoning architecture is the substantive engineering claim here. Previous multimodal approaches, including GPT-4o, routed different input types through specialized subsystems before merging representations. GPT-5's architecture reportedly reasons across modalities in a shared representational space, which OpenAI says reduces hallucination rates on cross-modal tasks — though independent benchmark verification on that claim is still pending at launch.

The one-million-token context window is the other major practical upgrade. For developers, this means entire large codebases, legal documents, or hours of transcribed audio can be processed in a single prompt without chunking workarounds. The practical limits — latency, cost-per-token at that context length, and whether the model actually attends to information at the far end of a 1M-token prompt — will surface quickly in production use.

Pricing and rate limits for the new model have been published on OpenAI's API pricing page, with higher context tiers priced above GPT-4o equivalents. Enterprise pricing follows the existing tiered structure. The model replaces GPT-4o as the default in ChatGPT Plus starting today, though users can still select previous models from the model picker.

Panel Takes

The Builder

The Builder

Developer Perspective

The primitive here is a single unified inference endpoint that handles text, images, audio, and video without you stitching together a pipeline — and the 1M token context means chunking hacks can finally die. The DX bet is that OpenAI puts all the complexity in the model and keeps the API surface flat, which is the right call. What I'm watching in the first week: actual latency numbers at 500K+ tokens in production, because a context window that takes 45 seconds to return isn't a feature, it's a liability.

The Skeptic

The Skeptic

Reality Check

'Native multimodal reasoning in a shared representational space' is doing a lot of work in this announcement — that's an architectural claim OpenAI has made before with GPT-4o and the gap between the pitch and the cross-modal performance was real. The 1M context window is shipping, but the critical question nobody can answer at launch is retrieval fidelity at the far end of that window, which is where every long-context model so far has degraded badly. I'll ship this when independent evals on cross-modal reasoning and needle-in-a-haystack at 900K tokens exist — until then, treat the benchmark claims as self-reported.

The Futurist

The Futurist

Big Picture

The thesis GPT-5 is betting on is falsifiable: unified multimodal reasoning collapses the category of 'specialized AI tools' because a single model that reasons across all modalities at sufficient quality removes the reason to route tasks to domain-specific models. The second-order effect nobody is talking about is what 1M-token context does to the memory and retrieval infrastructure market — RAG pipelines and vector databases built around 128K context limits just had their core use case disrupted overnight. This is on-time for the multimodal trend but early for the context-length trend, meaning the infrastructure to actually consume 1M tokens at production scale doesn't exist yet outside of OpenAI's own stack.

The Founder

The Founder

Business & Market

The moat question here is the same one it's always been with OpenAI: they're selling a commodity that they control the pricing on, and every enterprise customer building on this API is one price change or model deprecation away from a hard conversation. The 1M context window is real pricing leverage though — enterprise deals for legal, finance, and life sciences that need to process large document corpora have a new default answer, and that's a defined buyer with a real budget. What I'd stress-test is whether the per-token cost at high context lengths actually pencils out for the workflows that need it most, or whether the economics push production use cases back to 128K anyway.

Bookmarks

Loading bookmarks...

No bookmarks yet

Bookmark tools to save them for later