Back
ReutersFundingReuters2026-06-23

Scale AI Wins $2.4B DoD Contract for AI Data and Red-Teaming

Scale AI has been awarded a $2.4 billion multi-year contract by the U.S. Department of Defense to provide AI data labeling, evaluation, and red-teaming services. The deal is among the largest AI infrastructure contracts in U.S. government history.

Original source

Scale AI has secured a $2.4 billion multi-year contract with the U.S. Department of Defense, covering data labeling, model evaluation, and red-teaming services. The contract cements Scale's position as a primary infrastructure supplier for the U.S. military's AI ambitions, spanning everything from training data pipelines to adversarial testing of deployed models.

The scope of the deal reflects the DoD's strategic push to institutionalize AI evaluation rigor across its programs. Red-teaming — structured adversarial probing of AI systems for failure modes, bias, and security vulnerabilities — has become a non-negotiable gate for military AI deployments, and Scale has positioned itself as the primary independent vendor capable of doing this at classification-appropriate scale.

For Scale AI, this is a significant revenue anchor that validates its enterprise pivot away from pure data labeling commodity work. The company has been building toward government and defense contracts since at least 2021, with CEO Alexandr Wang making Washington relationships a visible priority. A contract of this size also buys Scale time and capital to invest in the proprietary tooling and workforce that makes switching costs real.

The deal raises questions about market concentration in government AI infrastructure. With Scale now holding a dominant contract position, competitors in the data labeling and evaluation space — including Surge AI, Labelbox, and emerging red-teaming specialists — face a significant credentialing gap to close if they want comparable DoD access.

Panel Takes

The Founder

The Founder

Business & Market

The buyer here is explicitly the U.S. federal government, which means the budget is defense appropriations — one of the most durable, recession-proof line items that exists. Scale's moat isn't the labeling tech, which is replicable; it's the cleared workforce, the facility authorizations, and the institutional trust that takes years to build and almost never gets rebuilt from scratch by a competitor. The real question is whether Scale can defend against the DoD eventually building this capability in-house, but given how badly that has gone historically with government IT programs, I'd bet on the contract renewing.

The Skeptic

The Skeptic

Reality Check

$2.4 billion sounds enormous until you spread it across multiple years and remember that government contracts routinely get restructured, have options that never get exercised, and come with compliance overhead that eats margin. The real question is what percentage of this is firm-fixed-price versus indefinite-delivery, and Scale hasn't disclosed that. What kills this in 12 months isn't a competitor — it's a budget continuing resolution, a change in DoD AI leadership priorities, or a GAO protest from a losing bidder that freezes spend while litigation drags on.

The Futurist

The Futurist

Big Picture

The thesis embedded in this contract is that AI evaluation and red-teaming will become a permanent, recurring infrastructure cost for every major government AI deployment — not a one-time audit, but an ongoing operational function, the same way penetration testing became standard for software security. If that's true, Scale just claimed the highest-value position in the AI supply chain: not the model builder, not the app layer, but the certifying authority. The second-order effect is that whoever holds this contract effectively sets the evaluation standards that all DoD AI vendors will be benchmarked against, which is a form of market power that doesn't show up on a revenue slide.

The PM

The PM

Product Strategy

The job-to-be-done here is brutally clear: the DoD needs AI systems it can trust enough to deploy in consequential contexts, and it needs a vendor who can credibly say a system passed adversarial review. Scale's product bet is that evaluation and red-teaming are distinct, repeatable jobs with enough process complexity to require dedicated tooling and workforce — not something a general-purpose contractor can spin up. The completeness concern I'd raise is whether Scale's red-teaming practice is deep enough to cover the full range of military AI use cases, from logistics to autonomous systems, or whether this contract will expose gaps that a narrower commercial focus didn't surface.

Bookmarks

Loading bookmarks...

No bookmarks yet

Bookmark tools to save them for later