Bonsai-8B
A true 1-bit 8B LLM that fits in 1.15 GB — runs on your iPhone
The Panel's Take
Bonsai-8B is PrismML's latest model in their BitNet-inspired lineage — an 8.2B parameter language model that has been quantized end-to-end to true 1-bit precision (weights stored as -1 or +1), compressing the entire model to just 1.15 GB. That's roughly 12-14x smaller than a standard FP16 equivalent. Unlike post-training quantization hacks that lose substantial quality, PrismML trained Bonsai-8B with 1-bit arithmetic baked into the forward pass from the start. Benchmark results are competitive for the size class: 63.8 on MMLU, 72.1 on HellaSwag, and 54.2 on GSM8K — while running at 131 tokens/sec on an M4 Pro MacBook and 44 tokens/sec on an iPhone 17 Pro Max. That makes it the fastest locally-runnable 8B model in its weight class on Apple Silicon. The MLX-optimized weights are available on Hugging Face today under Apache 2.0. The significance goes beyond benchmarks. Getting a capable open-weight model to run at interactive speeds on consumer hardware — with no API key, no GPU, no cloud dependency — is a meaningful step toward truly private, offline AI. This follows PrismML's earlier "Ternary Bonsai" (1.58-bit) but represents a cleaner binary architecture that's easier to accelerate on custom silicon.
Share this verdict
Bonsai-8B verdict: SHIP 🚀 3 ships · 1 skip from the expert panel Full review: shiporskip.io/tool/bonsai-8b-prismml-1bit-llm-iphone-1-15gb-apache-edge-ai-2026
Weekly AI Tool Verdicts
Get the next verdict in your inbox
7 critics review a new AI tool every day. Weekly digest — free.
Compare Bonsai-8B with Others
Embed this verdict
Tool makers can add a live ShipOrSkip badge to their site. Badge loads track impressions; clicks route back to this review.
<a href="https://shiporskip.io/api/badge-click/bonsai-8b-prismml-1bit-llm-iphone-1-15gb-apache-edge-ai-2026" target="_blank" rel="noopener"><img src="https://shiporskip.io/api/badge/bonsai-8b-prismml-1bit-llm-iphone-1-15gb-apache-edge-ai-2026" alt="Bonsai-8B Ship verdict on ShipOrSkip" width="360" height="90" /></a>[](https://shiporskip.io/api/badge-click/bonsai-8b-prismml-1bit-llm-iphone-1-15gb-apache-edge-ai-2026)<iframe src="https://shiporskip.io/embed/bonsai-8b-prismml-1bit-llm-iphone-1-15gb-apache-edge-ai-2026" title="Bonsai-8B ShipOrSkip verdict" width="360" height="260" style="border:0;border-radius:16px;max-width:100%;" loading="lazy"></iframe>The reviews
“131 tokens/sec on M4 Pro at 1.15 GB is genuinely impressive — I can embed this in a macOS app without any cloud dependency, no rate limits, no privacy concerns. The Apache 2.0 license means I can ship commercial products on top of it. This is the edge AI story I've been waiting for.”
“63.8 on MMLU is respectable but it's still noticeably behind mid-range cloud models on reasoning tasks. The GSM8K score of 54.2 means it'll fumble multi-step math that users expect to just work. Until 1-bit gets to 70B scale, it's a neat demo that falls short in production use cases where quality matters.”
“The trajectory here is what matters: 1-bit models are getting faster to train and competitive faster than expected. When custom Apple Neural Engine kernels land for BitNet-style weights, we'll see 200+ tokens/sec on a phone. Bonsai-8B is the proof-of-concept that makes that future feel real.”
“I've been looking for something I can embed in a creative writing or brainstorming app that doesn't require an internet connection. At 44 tokens/sec on iPhone, Bonsai-8B is finally fast enough to not break the creative flow. The 'no account required' angle is a genuine selling point for privacy-conscious users.”