Back
NvidiaInfrastructureNvidia2026-06-04

Nvidia Blackwell Ultra B300 GPUs Land at AWS, Azure, and CoreWeave

Nvidia's Blackwell Ultra B300 GPUs are now available to select cloud partners including AWS, Azure, and CoreWeave, delivering up to 50% higher inference throughput than the B200. Enterprise general availability is expected later this quarter.

Original source

Nvidia has begun shipping its Blackwell Ultra B300 GPUs to select cloud partners, with AWS, Azure, and CoreWeave confirmed as early recipients. The B300 represents the top tier of the Blackwell architecture, offering up to 50% higher inference throughput compared to the standard B200 — a meaningful jump for workloads bottlenecked by token generation speed rather than model loading or context handling.

The timing aligns with accelerating demand for inference capacity, particularly as larger reasoning models and long-context applications push throughput requirements well beyond what H100-era hardware was designed to handle. Cloud providers getting early access will be able to price and provision B300-backed instances ahead of broader enterprise rollout, giving them a window to establish workload migration paths and pricing tiers before general availability.

Nvidia has not published detailed architectural specifications beyond the throughput figure, and independent benchmarks from the cloud partners are not yet available. The 50% claim is Nvidia's own, and real-world gains will vary significantly by model architecture, batch size, and memory bandwidth utilization. Enterprise customers waiting for GA should expect a fuller picture once cloud providers publish their own performance data alongside instance pricing.

General availability for enterprise customers is expected later this quarter, which puts the broader rollout roughly in line with the cadence Nvidia established with the B200 transition. For organizations currently running inference on H100 or A100 fleets, the B300's availability through major cloud providers represents the clearest upgrade path yet — assuming the throughput claims hold up under production workloads.

Panel Takes

The Builder

The Builder

Developer Perspective

The primitive here is raw inference throughput delivered as a managed cloud resource — no new API surface, just faster hardware underneath existing endpoints. The 50% throughput claim is Nvidia's number with no methodology attached, which means I'm not citing it in any architecture decision until AWS or CoreWeave publish their own benchmarks with real batch sizes and model configs. What I actually care about: whether the per-token cost drops proportionally, because faster hardware that just lets cloud providers charge the same rate for less wall-clock time isn't a win for anyone running inference at scale.

The Skeptic

The Skeptic

Reality Check

'Up to 50% higher inference throughput' is doing a lot of work in this announcement — that ceiling number was authored by Nvidia, and the floor is presumably somewhere south of zero improvement on workloads that aren't memory-bandwidth-bound. The real test is what AWS and Azure price per GPU-hour on B300 instances versus B200, because if the premium outpaces the throughput gain, most inference operators will sit on existing fleet allocations and wait for spot pricing to normalize. This is a supply chain announcement dressed as a performance announcement, and those are different things.

The Futurist

The Futurist

Big Picture

The thesis Nvidia is betting on: inference demand will grow faster than any single hardware generation can absorb, which means the cadence of GPU releases needs to compress rather than expand. B300 availability at cloud partners before enterprise GA is structurally interesting — it means the hyperscalers are becoming the validation layer for hardware claims, which shifts power toward whoever runs the largest inference fleets. If that trend holds, the second-order effect is that enterprises increasingly have no path to on-prem competitiveness for frontier inference, and the 'cloud repatriation' narrative for AI workloads quietly dies.

The Founder

The Founder

Business & Market

The buyer here is any company running inference at a scale where a 50% throughput increase translates directly to a lower cost-per-call — that's a defined, measurable value prop and a real budget line item in ML infrastructure spend. What I'd stress-test: CoreWeave's inclusion alongside AWS and Azure is the interesting signal, because it confirms that specialized inference cloud providers are being treated as first-tier partners, which has real implications for whether enterprises use CoreWeave as a cost lever against the hyperscalers. Nvidia's moat deepens with every generation because switching to alternative silicon requires revalidating every production model and kernel — that lock-in is worth more than any single benchmark number.

Bookmarks

Loading bookmarks...

No bookmarks yet

Bookmark tools to save them for later