Cohere Command R Ultra: 128K Context and Smarter Tool Use
Cohere's Command R Ultra arrives with 128K token context and meaningfully improved multi-step tool use and RAG performance, aimed squarely at enterprise knowledge management and agentic workflows. The model is available now through Cohere's API and AWS Bedrock.
Original sourceCohere has launched Command R Ultra, an upgrade to its enterprise-focused model line that extends the context window to 128K tokens and ships with what the company describes as substantially improved multi-step tool use and retrieval-augmented generation (RAG) performance. The release targets organizations running large-scale document retrieval pipelines and agentic workflows where models need to coordinate multiple tool calls in sequence without losing the thread.
The 128K context window puts Command R Ultra in the same bracket as competing enterprise models from Anthropic and Google, but Cohere's differentiation has always leaned on deployment flexibility and data privacy — both the API and AWS Bedrock availability on day one reinforce that positioning. Enterprise buyers who can't route sensitive documents through consumer-facing AI services have historically been Cohere's core constituency.
The multi-step tool use improvements are the more operationally significant claim here. RAG pipelines that require agents to retrieve, evaluate, and act across multiple tools in a single session have been a known failure mode for instruction-tuned models — the model either hallucinates tool calls or loses context between steps. Cohere hasn't published a detailed benchmark methodology alongside the launch, so the performance claims will need independent validation before production teams should anchor hard decisions on them.
Command R Ultra represents Cohere's continued push to own the enterprise API layer rather than the consumer AI surface. With model commoditization accelerating across the industry, the bet is that enterprise buyers will pay a premium for a model optimized for their specific deployment constraints — private cloud, compliance-friendly infrastructure, and workflows that require reliable tool orchestration rather than general-purpose chat.
Panel Takes
The Builder
Developer Perspective
“The primitive here is a hosted enterprise LLM with native tool-use primitives and a 128K context window available on day one via both a direct API and Bedrock — that's a real deployment path, not a waitlist. The DX bet Cohere has consistently made is to put tool definitions and RAG connectors in the API rather than in a separate orchestration layer, which means you're not forced to adopt their platform wholesale. What I'd want to verify immediately: does the multi-step tool use hold up across 10+ sequential calls in a single session, or does it degrade gracefully at step three like most models do under real agentic load?”
The Skeptic
Reality Check
“128K context is table stakes in 2026 — Anthropic, Google, and OpenAI all ship it, so the headline number isn't a differentiator, the execution is. The multi-step tool use claim is where I'd pump the brakes: Cohere hasn't published a benchmark methodology, which means 'substantially improved' is marketing language until someone runs this against a standardized agentic eval suite like ToolBench or BFCL. What kills this in 12 months isn't a competitor — it's AWS or Azure shipping a fine-tuned enterprise model at the infrastructure layer that makes the independent API layer redundant for the exact buyers Cohere is targeting.”
The Founder
Business & Market
“The buyer here is a VP of Engineering or Chief Architect at a regulated enterprise — legal, finance, healthcare — who needs a capable model that can run in a compliant cloud environment without routing sensitive documents through OpenAI's servers, and that budget comes from IT or AI transformation spend, not an innovation slush fund. The AWS Bedrock availability is the real business decision in this launch: Bedrock access means Cohere gets to ride AWS's enterprise sales motion and existing procurement relationships instead of building their own. The moat question is whether 'enterprise-optimized RAG and tool use' stays defensible when the hyperscalers start fine-tuning their own foundation models on the same enterprise deployment patterns — Cohere has probably 18 months before that threat is real.”
The Futurist
Big Picture
“The thesis Command R Ultra is betting on is specific and falsifiable: enterprise AI value in the next two years will be captured by models that reliably orchestrate multi-step tool use across private data systems, not by models with the highest general benchmark scores. The dependency that has to hold is that enterprises continue to run significant workloads in private cloud or hybrid environments rather than accepting consumer AI data terms — if regulatory pressure on AI data handling softens, Cohere's deployment-flexibility moat shrinks fast. The second-order effect worth watching is what reliable multi-step tool use does to the middleware layer: if the model itself handles orchestration coherently, a lot of LangChain-style orchestration frameworks become unnecessary complexity rather than essential plumbing.”