Claude 4 Arrives: 1M Token Context and Extended Thinking Mode

Anthropic has released Claude 4, featuring a 1 million token context window, enhanced extended thinking mode, and benchmark results that Anthropic claims surpass previous state-of-the-art on coding and reasoning tasks. The model is available immediately via API and Claude.ai.

Original source

Anthropic has shipped Claude 4, its most capable model to date, with a headline 1 million token context window that puts it in direct competition with Gemini 1.5 Pro's long-context capabilities. The model ships with what Anthropic calls "extended thinking mode," a chain-of-thought reasoning layer the company says produces meaningfully better results on complex multi-step problems including math, code generation, and logical inference.

The context window expansion is the most immediately practical change for developers. One million tokens translates to roughly 750,000 words — enough to ingest an entire large codebase, a full novel, or years of conversation history in a single prompt. Whether latency and cost remain workable at that context length for production use cases is a question Anthropic's pricing and documentation will need to answer clearly.

The extended thinking feature builds on the pattern established by OpenAI's o-series and Google's Gemini thinking models, where the model allocates additional compute to "thinking" before responding. Anthropic has positioned this as a toggleable mode rather than a separate model, which is a different product bet than OpenAI's approach of shipping o3 and GPT-4o as distinct endpoints.

Benchmark claims from Anthropic show Claude 4 surpassing prior state-of-the-art on coding and reasoning evaluations, though these benchmarks were run and reported by Anthropic itself. Independent third-party evaluation will be the real signal. The model is available now through the Anthropic API and directly via Claude.ai, with pricing details available on Anthropic's pricing page.

Panel Takes

The Builder

Developer Perspective

“The primitive here is straightforward: a long-context, chain-of-thought model accessible via a single API endpoint with thinking mode as a parameter flag rather than a separate model ID — that's the right call, it means no routing logic in your application code. The DX bet is that developers want one model that adapts rather than a fleet of specialized endpoints, and I think that's correct. What I need to see before shipping anything against this: real latency numbers at 500k+ token context, and whether the extended thinking token usage is metered separately or baked into the response cost, because that detail will break or make most production budgets.”

The Skeptic

Reality Check

“Anthropic benchmarked Anthropic's model and found it was the best — alert the press. The 1M token context is real and useful, but Gemini 1.5 Pro has been at 1M tokens for over a year, so this is table stakes, not a leap. The scenario where this breaks: anyone doing retrieval-heavy production workloads will hit cost and latency walls long before the context limit matters, and the extended thinking mode adds tokens that aren't free. What kills this in 12 months isn't a competitor — it's that OpenAI and Google will both have equivalent or larger context with cheaper per-token pricing, and Anthropic's moat is "vibes" and Constitutional AI, neither of which survives a 10x price drop from a larger infrastructure player.”

The Futurist

Big Picture

“The real thesis Claude 4 is betting on: by 2027, the economically significant AI work is done by models that hold entire systems in context simultaneously — full codebases, complete legal records, end-to-end business processes — rather than models that retrieve and chunk. That's a falsifiable claim and 1M tokens is a direct infrastructure bet on it. The second-order effect that nobody is talking about yet is what happens to the retrieval-augmented generation market — if context windows get large enough and cheap enough, RAG pipelines become an optimization rather than a necessity, which collapses an entire layer of the current AI infrastructure stack. Anthropic is riding the cost-per-token deflation trend and is roughly on time, but the bet only pays off if inference costs drop another order of magnitude in the next 18 months.”

The PM

Product Strategy

“The job-to-be-done split here is interesting and slightly concerning: extended thinking serves the "solve a hard problem" job, while 1M context serves the "work with all my stuff at once" job — those are different users with different workflows, and shipping them together in one model announcement makes the product story muddy. The smart product decision is the single-endpoint approach with thinking as a toggle, because it means users don't have to decide which model to route to before they know how hard their problem is. The gap I'd flag: does the Claude.ai interface actually let non-API users take advantage of 1M token context in a workflow that makes sense, or is it just a number on the spec sheet that only developers can operationalize?”

Panel Takes

Bookmarks