Back
UC Berkeley RDI BlogResearchUC Berkeley RDI Blog2026-04-12

Berkeley Research: Small Models Find the Same Vulnerabilities as Anthropic's Restricted Mythos — The 'Safety Through Restriction' Argument Takes a Hit

New Berkeley research shows that standard, publicly-available smaller models can identify the same cybersecurity vulnerabilities Anthropic claimed required the restricted Mythos model to find. The findings complicate the narrative around capability restriction as a safety mechanism and reignite debate about what 'dual-use restriction' actually accomplishes.

Original source

When Anthropic launched Project Glasswing in March 2026, the $100M cybersecurity initiative's headline claim was that it required a specially restricted model — Claude Mythos — to find the two decade-old zero-day vulnerabilities that made the initiative famous. The implication was clear: some AI capabilities are dangerous enough to require controlled, restricted access. The model can find these vulnerabilities *because* it's more capable, and that capability must be managed carefully.

New research from UC Berkeley's Responsible Decentralized Intelligence (RDI) lab challenges that narrative directly. The study found that smaller, publicly-available models — none of which require special access or safety restrictions — can independently identify the same vulnerability classes that Mythos allegedly required frontier capability to find. The paper, drawing on Berkeley's ongoing work on trustworthy AI benchmarks, argues that "the marginal capability gap between frontier and near-frontier models on practical security tasks is significantly smaller than public communications suggest."

The findings land awkwardly for Anthropic and, more broadly, for the industry's emerging consensus around "capability restriction as safety." If a model you can run locally finds the same vulnerabilities as a restricted frontier model, then restriction hasn't made the dangerous capability unavailable — it's just made the branded version unavailable. Critics have called this security theater; defenders argue that even marginal friction matters at scale.

The debate has immediate policy implications. The AI safety community has increasingly converged on tiered access models — frontier capabilities go to vetted researchers, standard capabilities stay public — as a workable framework. Berkeley's research suggests that framework may be less robust than assumed, since the "dangerous" capability may already exist at smaller model scales. Whether this strengthens the case for more aggressive restrictions or simply exposes the limits of restriction-based approaches is now a live argument in AI policy circles.

Panel Takes

The Builder

The Builder

Developer Perspective

As a security engineer, this is the result I've been expecting. The threat model for AI-assisted vulnerability research was never 'only frontier models can do this' — it was always 'any sufficiently capable model, and capable-enough is a sliding scale.' This research makes the honest version of that argument in public.

The Skeptic

The Skeptic

Reality Check

Berkeley's methodology needs scrutiny before anyone draws strong policy conclusions. 'Can identify the same vulnerability classes' is not the same as 'can do what Mythos actually did.' The devil is in the operational specifics — speed, reliability, false-positive rate. One paper isn't a policy framework.

The Futurist

The Futurist

Big Picture

This is the 'jagged frontier' problem playing out in real time. Capabilities we thought required frontier models keep appearing at smaller scales faster than our governance frameworks adapt. The implication isn't that restriction is pointless — it's that we need to govern the capability, not the model size.