AI tool comparison
Darwin-4B-David vs Nothing Ever Happens
Which one should you ship with? Here is the side-by-side panel verdict, pricing read, reviewer split, and community vote comparison.
AI Models
Darwin-4B-David
4.5B merged model beats Gemma-4-31B on GPQA — no training needed
75%
Panel ship
—
Community
Paid
Entry
Darwin-4B-David is a 4.5-billion-parameter model that achieves 85.0% on GPQA Diamond — outperforming Google's Gemma-4-31B (84.3%) at roughly 1/7th the parameter count. The kicker: it required no training whatsoever. It was built in 45 minutes on a single H100 using MRI-guided DARE-TIES model merging, a novel variant of the merge-and-trim technique. The MRI-guided approach uses activation analysis to identify which parameters in each source model are most critical, then applies DARE-TIES merging only to the high-value weight regions. This avoids the catastrophic interference that usually degrades merged models. The result is a small model that inherits the strengths of multiple larger predecessors without any of the compute cost of fine-tuning. For the AI community, this is a meaningful data point: model merging continues to close the gap with expensive training runs. Darwin-4B-David demonstrates that thoughtful merge strategies can extract benchmark-level performance from models that are a fraction of the size, making capable AI more accessible on consumer hardware.
AI Experiments
Nothing Ever Happens
An autonomous bot that always bets 'No' on Polymarket doom predictions—and profits
75%
Panel ship
—
Community
Free
Entry
Nothing Ever Happens is a deliberately simple autonomous trading bot that buys "No" contracts on Polymarket prediction markets—specifically targeting non-sports questions about dramatic or catastrophic events. The thesis: humans systematically overestimate the probability that scary predicted events will actually happen. The bot filters markets using LLM-based criteria to exclude sports (where outcomes are more unpredictable) and focuses on the long tail of geopolitical, tech, and social predictions that tend toward "nothing happens." Built by Sterling Crispin (an artist and technologist known for his work on Apple Vision Pro), the project is equal parts satirical commentary and functional trading system. It logs all positions, P&L, and reasoning chains so you can audit its decisions. The name references an internet phrase mocking catastrophist news cycles—"nothing ever happens" is the skeptic's rebuttal to perpetual crisis framing. The HN post hit 370 points and 180+ comments in a few hours, sparking genuine debate about whether this is a sound strategy, a fun toy, or a comment on prediction market epistemology. Real-world results aren't yet published, but the idea of using an LLM as a "doom filter" for prediction markets is novel enough to be worth watching.
Reviewer scorecard
“45 minutes on a single H100 to beat a 31B parameter model? That's an extraordinary efficiency ratio. MRI-guided merging is a technique I'll be watching closely. If this holds up across more benchmarks, it fundamentally changes how teams should think about building capable small models.”
“Clean architecture, good logging, and a legitimately interesting hypothesis about prediction market psychology. The LLM filtering layer for 'doom vs. non-doom' questions is a smart abstraction. Even if the strategy underperforms, the codebase is a solid template for automated Polymarket bots.”
“GPQA Diamond is one benchmark. One. Benchmark performance doesn't translate linearly to real-world task performance, especially for a merged model that hasn't been fine-tuned for instruction following or RLHF alignment. Impressive number, but I'd want to see this on coding, reasoning chains, and RAG tasks before getting excited.”
“The strategy looks good in backtests but Polymarket's liquidity is thin and arbitrageurs will price this edge away quickly once it's well-known. Also: 'nothing ever happens' is survivorship bias dressed as strategy—the times something DOES happen, you're wiped out. Don't put meaningful capital here.”
“Model merging is the dark horse of AI efficiency research. If MRI-guided DARE-TIES merging can reliably produce results like this, it suggests we're nowhere near the ceiling for extracting value from existing open-weight models. The future may involve less training and more intelligent composition.”
“Autonomous agents that trade prediction markets based on LLM-assessed epistemic calibration is a genuinely new thing. If this works at scale, it could actually make prediction markets more accurate by algorithmically correcting for human doom-bias. That's a more interesting outcome than any individual P&L.”
“A capable model in the 4-5B range that can run on a MacBook M-series is exactly what solo creators need for on-device inference. If Darwin-4B-David's performance holds on creative tasks, it's a genuine local creative AI for people without cloud budgets.”
“Sterling Crispin making a 'nothing ever happens' bot is peak art-meets-tech. It's a functional piece of commentary on the anxiety economy—we're so primed for crisis that prediction markets misprice normalcy. The aesthetic of it is as interesting as the trading logic.”
Weekly AI Tool Verdicts
Get the next comparison in your inbox
New AI tools ship daily. We compare them before you waste an afternoon.