Nvidia DGX Cloud Lepton: On-Demand GPU Marketplace for AI Devs

Nvidia launched DGX Cloud Lepton, a spot-market-style GPU compute marketplace offering on-demand H100 and B200 cluster access with per-minute billing and 30-second cold-start times. The platform integrates with Weights & Biases and MLflow, positioning itself as a flexible alternative to reserved cloud GPU contracts.

Original source

Nvidia has unveiled DGX Cloud Lepton, a marketplace that lets AI developers provision H100 and B200 GPU clusters on demand without long-term commitments. The platform uses a spot-market pricing model with per-minute billing, targeting teams that need burst capacity for training runs, experiments, and inference workloads without paying for idle reserved instances.

The headline technical claim is a 30-second cold-start time — meaning a cluster should be available and ready to accept jobs within half a minute of provisioning. Nvidia is also advertising native integrations with popular MLOps tooling including Weights & Biases for experiment tracking and MLflow for model lifecycle management, reducing the glue code developers typically write to connect compute to their observability stack.

Lepton competes directly with spot GPU markets from AWS, CoreWeave, Lambda Labs, and RunPod, all of which have been expanding H100 availability aggressively over the past year. Nvidia's differentiator here is its vertical position: as the GPU manufacturer, it controls the full stack from silicon to software, and DGX Cloud gives it a direct commercial relationship with AI developers that has historically been mediated by cloud providers.

The launch represents a notable strategic move for Nvidia — one that puts it in more direct competition with its own largest customers, the hyperscalers. Whether per-minute billing and fast cold-starts are enough to pull workloads away from AWS and GCP depends heavily on pricing transparency and actual availability during peak demand periods, neither of which Nvidia has fully disclosed at launch.

Panel Takes

The Builder

Developer Perspective

“The primitive here is straightforward: a GPU spot market with a faster provisioning API than what the hyperscalers ship. The 30-second cold-start claim is the only number that matters to me, and I won't believe it until I've run `nvidia-smi` in a freshly provisioned cluster and looked at the timestamp delta myself — Nvidia has not published any methodology or test conditions for that figure. The W&B and MLflow integrations are the right DX bet if they're first-class and not just 'set these four environment variables,' but the landing page doesn't show me an actual API call, which is a yellow flag.”

The Skeptic

Reality Check

“The direct competitors are CoreWeave and Lambda Labs, both of which already offer on-demand H100 access with per-minute billing — so the category exists and the feature set isn't novel. The scenario where Lepton breaks is peak demand: spot markets by definition have capacity constraints, and Nvidia hasn't disclosed what happens to that 30-second cold-start SLA when every foundation model lab is trying to run evals simultaneously. What kills this in 12 months is not a competitor but Nvidia's own largest customers — AWS, GCP, and Azure — quietly throttling access or bundling competing reserved GPU deals with broader enterprise discounts to keep workloads from migrating to Lepton.”

The Founder

Business & Market

“The buyer here is the ML engineer or infra lead at a startup or mid-market AI company who has already burned money on idle reserved instances and wants burst capacity without a commitment — that's a real budget line and a real pain point. The moat is not the marketplace mechanics, which CoreWeave can copy; it's that Nvidia as the silicon vendor can structurally offer better allocation priority and potentially lower floor pricing than resellers, which is a durable structural advantage if they choose to use it. The existential risk is channel conflict: AWS and Google are Nvidia's biggest GPU customers by volume, and if Lepton meaningfully pulls enterprise workloads, those relationships get complicated fast.”

The Futurist

Big Picture

“The thesis Nvidia is betting on is specific and falsifiable: AI training and inference workloads will increasingly look like burst compute — short, high-intensity jobs rather than long-running reserved clusters — and the team or vendor that owns the provisioning layer owns the developer relationship. That bet only pays off if model training continues to fragment into smaller, more frequent runs rather than consolidating into fewer massive pre-training jobs owned by hyperscalers, which is the dominant trend in fine-tuning and RL post-training right now. The second-order effect that nobody is talking about: if Lepton succeeds, Nvidia gains direct telemetry on how AI workloads actually run at scale, which is a data asset that makes every future hardware and software product decision measurably better than what any cloud provider can see.”

Panel Takes

Bookmarks