Sakana AI Scientist 2 Writes ML Papers That Pass Peer Review

Sakana AI has launched AI Scientist 2, an autonomous system that generates hypotheses, runs experiments, and writes complete machine learning papers that independent researchers say meet workshop submission standards. It represents a meaningful step toward closing the loop on AI-generated scientific output.

Original source

Sakana AI has released AI Scientist 2, the second iteration of its autonomous research system designed to handle the full pipeline of machine learning research: hypothesis generation, experimental design, code execution, results analysis, and manuscript writing. The company claims the system produces papers that meet the technical bar for peer review, and early evaluations by external researchers support that some output is workshop-submission-ready — a meaningful claim given the system operates with minimal human intervention.

The original AI Scientist, released in 2024, attracted significant attention as a proof-of-concept but was criticized for producing papers with reproducibility issues and factual errors. AI Scientist 2 appears to address some of those shortcomings with tighter experimental scaffolding and improved self-evaluation loops, though Sakana has not released a rigorous breakdown of failure rates or the proportion of generated papers that actually meet the stated quality bar.

What makes this notable beyond prior autonomous writing tools is the closed-loop design: the system doesn't just write about research, it actually runs the experiments its hypotheses require and incorporates real results. This distinguishes it from large language model pipelines that synthesize existing literature without generating new empirical data. Whether the science it produces is novel enough to advance the field — versus being technically correct but incrementally trivial — remains an open question that external review will need to answer.

The broader implication is one of research economics. If a system can reliably produce workshop-quality ML papers at scale, it applies pressure to conference review systems already strained by submission volume, and raises hard questions about authorship, credit attribution, and what peer review is actually gatekeeping. Sakana has not yet published details on how AI Scientist 2 handles attribution or whether submissions generated by the system would be disclosed as such.

Panel Takes

The Skeptic

Reality Check

“'Peer-review-ready' is doing enormous work in this headline, and 'some papers technically sound enough for workshop tracks' is a very different claim than producing research that advances the field. The closest competitor here is a well-prompted GPT-4o with a code interpreter and a LaTeX template — the question is whether the closed-loop experiment execution actually produces novel empirical findings or just dresses up LLM confabulation in results tables. I'd want to see the proportion of generated papers that passed external review versus the total generated, not just the cherry-picked successes, before calling this anything more than a very expensive demo.”

The Futurist

Big Picture

“The thesis here is falsifiable and consequential: that the bottleneck in ML research is human researcher-hours, not insight, and that a system which can run real experiments and write about them closes that gap at scale. If that's true, the second-order effect isn't faster research — it's that conference review systems collapse under autonomous submission volume within 18 months, forcing a structural rethink of how the field validates knowledge. This tool is riding the trend of compute-cheap experimentation, and it's early enough that Sakana could define the norms before regulators or conferences do, which is either a massive opportunity or a massive liability depending on how responsibly they move.”

The Founder

Business & Market

“The buyer here is ambiguous in a way that should concern anyone thinking about the business model — is this sold to individual researchers, labs, or enterprises trying to generate IP? Each of those is a completely different sales motion, pricing structure, and defensibility story. The moat question is harder: Sakana's value is in the orchestration layer and experimental scaffolding, but if Anthropic or OpenAI ships native 'run this experiment and write it up' tooling — which they will — the differentiation evaporates unless Sakana has locked in institutional relationships or proprietary training data from the research loops. I'd want to see the pricing page and a clear answer on who actually writes the check before calling this a business rather than a research project with a press release.”

The PM

Product Strategy

“The job-to-be-done is clear — automate the research-to-manuscript pipeline for ML practitioners — but the product has a completeness problem: a system that produces papers at scale without a built-in disclosure and attribution workflow isn't a complete product, it's a liability waiting to happen for any institution that adopts it. The first two minutes of onboarding for a real researcher aren't 'generate a paper,' they're 'how do I disclose this to my institution, conference, and co-authors,' and if that answer isn't built into the product, Sakana is shipping half the job. The opinion the product needs — and doesn't yet seem to have — is a clear stance on how AI-generated research gets credited, because without that stance, every power user is on their own to figure it out.”

Panel Takes

Bookmarks