AI tool comparison
Talkie vs Talkie
Which one should you ship with? Here is the side-by-side panel verdict, pricing read, reviewer split, and community vote comparison.
Research
Talkie
A 13B LLM trained exclusively on texts from before 1931
75%
Panel ship
—
Community
Free
Entry
Talkie is a 13-billion parameter language model trained exclusively on English-language texts published before 1931 — the largest vintage language model built to date. Created by researchers Nick Levine, David Duvenaud (University of Toronto), and Alec Radford (of GPT and DALL-E fame), it represents a novel approach to understanding what training data really does to a model. The research insight is elegant: modern LLMs are so thoroughly contaminated by modern internet data (directly or through distillation) that it's nearly impossible to isolate what the model "knows" from what it absorbed during training. Talkie solves this by hard-cutting the training corpus at 1931 — predating digital computers entirely. This lets the team run controlled experiments impossible with contemporary models, such as teaching the model to write Python from examples alone and measuring how quickly it generalizes. Talkie was trained on ~260 billion tokens of historical text and fine-tuned using direct preference optimization with Claude as judge on structured historical documents (etiquette manuals, letter-writing guides). It's openly available on Hugging Face for research use. It also happens to produce wonderfully formal, slightly anachronistic prose.
Research
Talkie
A 13B LLM trained only on pre-1931 text — by design
75%
Panel ship
—
Community
Free
Entry
Talkie is a 13-billion-parameter language model with an unusual constraint: it was trained exclusively on text written before 1931. That means no internet, no Wikipedia, no modern code — just 260 billion tokens of books, newspapers, journals, patents, and case law from the pre-modern era. The result is a "vintage" LLM that speaks like it's from the early 20th century and has zero knowledge of anything after its cutoff. The model was built by Nick Levine, David Duvenaud, and Alec Radford (yes, one of the original GPT authors) with support from Anthropic and Coefficient Giving. The scientific motivation is rigorous: Talkie enables researchers to cleanly test how models generalize to unfamiliar tasks from examples alone (since it's never seen Python), study future prediction capabilities without data leakage, and understand how training data diversity shapes model dispositions and values. An instruction-tuned version exists, trained on synthetic data derived from historical etiquette manuals and cookbooks, enabling actual conversation. The model is available free on Hugging Face with a live chat demo on their site. A larger variant is planned for summer 2026.
Reviewer scorecard
“The ability to test code-learning from scratch on a model that's never seen a modern codebase is genuinely useful for ML research. The methodology here is cleaner than anything I've seen for studying data contamination.”
“This is one of the most scientifically interesting model releases I've seen. A clean pre-1931 cutoff gives researchers a genuinely controlled environment for studying generalization, data contamination, and in-context learning — problems that plague every other benchmark we have.”
“Fascinating as a research artifact, but this isn't a production model. The limited vocabulary and cultural frame mean it's not useful for most practical tasks. It's a museum piece, not a tool.”
“This is a research artifact, not a tool. Unless you're studying AI generalization or historical NLP, there's nothing here for practitioners. The 'it speaks like 1930' angle is fun for demos but the actual scientific payoff is years from materializing into anything usable.”
“This is exactly the kind of fundamental research the field needs. Understanding what training data does to language models — not just benchmark scores — is critical as we scale to more powerful systems. Radford's involvement adds serious credibility.”
“Alec Radford doesn't build toys. A model trained this carefully to isolate temporal knowledge enables experiments we genuinely can't run any other way — like testing whether a model can predict future events from historical patterns alone. This could reframe how we think about benchmark contamination.”
“The prose it generates has a formal, unhurried quality that modern LLMs can't replicate. For period-accurate creative writing, historical fiction, or vintage-voice content, Talkie is the only model worth using.”
“Writers working on historical fiction or period-accurate dialogue have a dream tool here. A model that only knows 1930s-era language and references can help maintain authentic voice without accidentally slipping in modern idioms. That's a genuinely useful creative constraint.”
Weekly AI Tool Verdicts
Get the next comparison in your inbox
New AI tools ship daily. We compare them before you waste an afternoon.