Nvidia Blackwell Ultra B300 GPUs Now Shipping to AWS, Azure, and GCP

Nvidia has confirmed that its Blackwell Ultra B300 GPUs are now shipping to Amazon Web Services, Google Cloud, and Microsoft Azure. The new hardware delivers up to 1.5x inference throughput compared to the B200 at the same power envelope, a meaningful efficiency gain for cloud providers whose cost structures are tightly coupled to rack power and cooling.

The B300 represents an incremental step within the Blackwell architecture rather than a full generational leap. The throughput gains appear to come primarily from higher memory bandwidth and refined on-chip interconnects, making the upgrade particularly relevant for large-model inference workloads where memory-bound operations dominate. Nvidia has not published detailed methodology for the 1.5x figure, and independent third-party benchmarks are not yet available.

For cloud customers, access to B300 capacity will depend on each provider's rollout timeline — availability will not be uniform across regions or instance tiers at launch. On-premises customers will need to wait until Q3 2026 for general availability, meaning enterprise buyers with direct hardware procurement cycles are looking at a meaningful lag behind their cloud-hosted counterparts.

The announcement continues Nvidia's cadence of refreshing its data center GPU lineup roughly annually, keeping competitive pressure on AMD's MI300X successors and Intel's Gaudi line. With inference demand accelerating across hyperscalers, the B300's power-efficiency story is likely more commercially important than raw peak performance — cloud providers are increasingly constrained by power delivery, not floor space.

Panel Takes

The Builder

Developer Perspective

“The 1.5x inference throughput claim is the number everyone will copy-paste into their infrastructure proposals, but Nvidia hasn't published the benchmark methodology, and 'equivalent power' is doing a lot of work in that sentence. What matters for developers is whether the cloud instance types that surface this hardware expose the memory bandwidth gains cleanly through existing CUDA and TensorRT toolchains without requiring re-tuning — if you have to recompile and re-profile your inference stack to see the gains, the real-world lift for most teams will be substantially less than 1.5x. I'll care more when AWS and GCP publish their instance specs and we can run actual workloads against them.”

The Skeptic

Reality Check

“Nvidia says 1.5x inference throughput over B200 at equivalent power, but that benchmark has no published methodology and was written by Nvidia — treat it as a floor for their best-case workload, not a ceiling for yours. The real test is whether hyperscalers actually price B300 instances at a discount to B200 per token, because if the efficiency gains get absorbed as margin by AWS and Azure rather than passed to customers, the announcement is infrastructure news with no user-facing impact. The Q3 2026 on-prem GA date is the detail that matters most: enterprises buying their own hardware are being told to wait six-plus months, which means this is a cloud-lock play as much as it is a hardware launch.”

The Futurist

Big Picture

“The thesis embedded in this launch is specific and falsifiable: inference compute will remain the binding constraint on AI deployment through at least 2027, and the entity that controls the power-efficiency curve on inference hardware controls the economics of every application layer above it. The B300's power-efficiency story is Nvidia betting that rack power — not chip count, not model size — is the actual scarce resource hyperscalers are managing, and that bet looks correct given the public commentary from AWS and Google on data center power procurement. The second-order effect is that each incremental efficiency gain from Nvidia makes it harder for AMD and custom silicon players to close the gap, because the target keeps moving and the software ecosystem compounds with each generation.”

The Founder

Business & Market

“The six-month gap between cloud GA and on-prem GA is a deliberate business decision, not a supply chain accident — Nvidia and the hyperscalers both benefit from inference workloads running on metered cloud infrastructure rather than owned hardware, and this sequencing reinforces that dynamic. For startups building inference-heavy products, the relevant question is whether B300 availability translates to lower per-token costs from their cloud provider, or whether the efficiency gains get captured entirely upstream; if it's the latter, the announcement is irrelevant to their unit economics. The moat here is unchanged: Nvidia's defensibility isn't the B300 specifically, it's the CUDA ecosystem and the fact that every ML engineer alive has optimized for it.”

Panel Takes

Bookmarks