Nvidia GB300 NVL72 Arrives on AWS and Azure as Cloud Instances

Nvidia has announced that its GB300 NVL72 — a rack-scale system pairing Grace Blackwell Superchips across 72 GPUs — is now accessible as cloud instances on Amazon Web Services and Microsoft Azure. The move brings high-density, NVLink-connected GPU capacity to enterprises without the capital and operational overhead of on-premise deployment. Early access is currently gated behind enterprise enrollment on both platforms.

The headline performance claim is a 1.5x improvement in inference throughput compared to H100 clusters for large language model workloads. That figure, cited by Nvidia, is framed around LLM serving scenarios where the NVL72's all-to-all NVLink bandwidth matters most — specifically long-context and multi-turn inference where GPU-to-GPU communication becomes the bottleneck rather than raw compute. Nvidia has not published a detailed methodology document alongside the announcement.

For cloud buyers, the GB300 NVL72 represents a different consumption model than renting individual GPU instances. The rack-scale unit operates as a single logical system, meaning workload scheduling and memory pooling happen across the full 72-GPU fabric rather than across discrete nodes stitched together over Ethernet or InfiniBand. Whether cloud providers expose that full coherence to tenants, or abstract it away behind standard instance APIs, will determine how much of the theoretical hardware advantage actually reaches users.

Nvidia's dual-cloud rollout with AWS and Azure is consistent with its infrastructure strategy of ensuring availability across major hyperscalers simultaneously, avoiding perception of favoritism while maximizing total addressable cloud revenue. The GB200 NVL72 predecessor saw staggered availability across providers; the GB300 launch appears more coordinated. Pricing details have not been disclosed publicly — both AWS and Azure are directing interested enterprises to account teams.

Panel Takes

The Builder

Developer Perspective

“The primitive here is a 72-GPU NVLink fabric exposed as a cloud instance — which is genuinely different from strapping H100s together over InfiniBand and hoping your NCCL config holds. The critical unknown is whether AWS and Azure surface the full NVLink coherence through their instance APIs or hand you a standard CUDA device list and call it a day. If the answer is the latter, the 1.5x throughput claim lives on Nvidia's benchmark rig, not in your production cluster.”

The Skeptic

Reality Check

“Nvidia is citing a 1.5x inference throughput improvement with no published methodology, no independent benchmark, and an 'early access via account team' purchasing flow — which is three yellow flags in a row. The direct competitor here is the H100 and H200 instances already running production workloads at scale on both clouds, and switching requires trusting a number Nvidia generated about its own hardware. What kills this in 12 months isn't competition — it's pricing opacity; if enterprises can't model the cost-per-token improvement before committing, adoption stalls at the pilot stage.”

The Futurist

Big Picture

“The thesis Nvidia is betting on is that LLM serving will increasingly be bottlenecked by GPU interconnect bandwidth rather than raw FLOPS, making rack-scale coherent memory a durable architectural advantage over collections of discrete nodes — and that bet is plausible given where context windows and multi-agent inference patterns are heading. The second-order effect nobody is talking about is that rack-scale cloud instances start to blur the line between cloud and HPC, which changes how ML infrastructure teams are hired and how inference costs get accounted. Nvidia is on-time to this trend, but the dependency that has to hold is that hyperscalers don't abstract away the NVLink advantage in their virtualization layer to simplify their own operations.”

The Founder

Business & Market

“The buyer here is the VP of ML Infrastructure at a large enterprise, pulling from a cloud compute budget that already has AWS or Azure committed spend — which means Nvidia is riding existing procurement relationships rather than creating new ones, and that's smart distribution. The moat is hardware scarcity and NVLink IP, not software, so the business question is whether Nvidia can keep the GB300 supply tight enough to sustain premium pricing before GB400 renders this conversation obsolete in 18 months. The 'contact your account team for pricing' approach tells you everything about margin strategy: this is not a commodity product and Nvidia intends to keep it that way for as long as the waitlist holds.”

Panel Takes

Bookmarks