AI tool comparison
AI-Scientist-v2 vs Talkie
Which one should you ship with? Here is the side-by-side panel verdict, pricing read, reviewer split, and community vote comparison.
Research & Science
AI-Scientist-v2
Sakana AI's autonomous agent that writes peer-reviewed papers
50%
Panel ship
—
Community
Free
Entry
AI-Scientist-v2 is Sakana AI's second-generation autonomous research system that generates scientific papers end-to-end — from hypothesis formation through experimentation, data analysis, and manuscript writing. It's historically notable for producing the first AI-authored workshop paper accepted through peer review. The v2 system removes reliance on human-authored templates that constrained the original, instead using a progressive agentic tree search guided by an experiment manager agent. This makes it more exploratory across ML domains, though Sakana acknowledges it trades v1's high template success rate for broader generalization with lower per-run success. Costs run roughly $20-25 per full research run using Claude 3.5 Sonnet. The system integrates with Semantic Scholar for literature review and supports OpenAI, Gemini, and Claude via AWS Bedrock. The custom license requires disclosure of AI use in resulting publications — a meaningful ethical constraint for a system that could otherwise flood conferences with AI-generated submissions.
Research
Talkie
A 13B LLM trained only on pre-1931 text — by design
75%
Panel ship
—
Community
Free
Entry
Talkie is a 13-billion-parameter language model with an unusual constraint: it was trained exclusively on text written before 1931. That means no internet, no Wikipedia, no modern code — just 260 billion tokens of books, newspapers, journals, patents, and case law from the pre-modern era. The result is a "vintage" LLM that speaks like it's from the early 20th century and has zero knowledge of anything after its cutoff. The model was built by Nick Levine, David Duvenaud, and Alec Radford (yes, one of the original GPT authors) with support from Anthropic and Coefficient Giving. The scientific motivation is rigorous: Talkie enables researchers to cleanly test how models generalize to unfamiliar tasks from examples alone (since it's never seen Python), study future prediction capabilities without data leakage, and understand how training data diversity shapes model dispositions and values. An instruction-tuned version exists, trained on synthetic data derived from historical etiquette manuals and cookbooks, enabling actual conversation. The model is available free on Hugging Face with a live chat demo on their site. A larger variant is planned for summer 2026.
Reviewer scorecard
“For ML research teams, the $20-25 per run cost to get a draft paper with experiments is genuinely interesting as an ideation tool. The tree search approach that explores multiple experimental directions in parallel is the kind of thing that would take a grad student weeks.”
“This is one of the most scientifically interesting model releases I've seen. A clean pre-1931 cutoff gives researchers a genuinely controlled environment for studying generalization, data contamination, and in-context learning — problems that plague every other benchmark we have.”
“Sakana's own documentation says v2 has lower success rates than v1 and is 'more exploratory.' Paying $25 for a failed research run with no guarantee of a usable output isn't a workflow most researchers will adopt. The peer review acceptance was a workshop paper — the lowest bar in academic publishing.”
“This is a research artifact, not a tool. Unless you're studying AI generalization or historical NLP, there's nothing here for practitioners. The 'it speaks like 1930' angle is fun for demos but the actual scientific payoff is years from materializing into anything usable.”
“This is the beginning of AI as a genuine research collaborator, not just a writing assistant. Within five years, AI-generated hypotheses tested by autonomous agents will be standard practice in computational fields. AI-Scientist-v2 is primitive version 0.2 of that future.”
“Alec Radford doesn't build toys. A model trained this carefully to isolate temporal knowledge enables experiments we genuinely can't run any other way — like testing whether a model can predict future events from historical patterns alone. This could reframe how we think about benchmark contamination.”
“Science communication is a craft, and the idea of fully automating it makes me uncomfortable. The best papers are ones where researchers deeply understand and can defend every methodological choice — a system that writes the paper for you undermines that accountability.”
“Writers working on historical fiction or period-accurate dialogue have a dream tool here. A model that only knows 1930s-era language and references can help maintain authentic voice without accidentally slipping in modern idioms. That's a genuinely useful creative constraint.”
Weekly AI Tool Verdicts
Get the next comparison in your inbox
New AI tools ship daily. We compare them before you waste an afternoon.