B

Bonsai-8B

A true 1-bit 8B LLM that fits in 1.15 GB — runs on your iPhone

PriceFree / Apache 2.0Reviewed2026-04-22
Verdict — Ship
3 Ships1 Skips
Visit prismml.com

The Panel's Take

Bonsai-8B is PrismML's latest model in their BitNet-inspired lineage — an 8.2B parameter language model that has been quantized end-to-end to true 1-bit precision (weights stored as -1 or +1), compressing the entire model to just 1.15 GB. That's roughly 12-14x smaller than a standard FP16 equivalent. Unlike post-training quantization hacks that lose substantial quality, PrismML trained Bonsai-8B with 1-bit arithmetic baked into the forward pass from the start. Benchmark results are competitive for the size class: 63.8 on MMLU, 72.1 on HellaSwag, and 54.2 on GSM8K — while running at 131 tokens/sec on an M4 Pro MacBook and 44 tokens/sec on an iPhone 17 Pro Max. That makes it the fastest locally-runnable 8B model in its weight class on Apple Silicon. The MLX-optimized weights are available on Hugging Face today under Apache 2.0. The significance goes beyond benchmarks. Getting a capable open-weight model to run at interactive speeds on consumer hardware — with no API key, no GPU, no cloud dependency — is a meaningful step toward truly private, offline AI. This follows PrismML's earlier "Ternary Bonsai" (1.58-bit) but represents a cleaner binary architecture that's easier to accelerate on custom silicon.

Share this verdict

Bonsai-8B verdict: SHIP 🚀

3 ships · 1 skip from the expert panel

Full review: shiporskip.io/tool/bonsai-8b-prismml-1bit-llm-iphone-1-15gb-apache-edge-ai-2026

Weekly AI Tool Verdicts

Get the next verdict in your inbox

7 critics review a new AI tool every day. Weekly digest — free.

Embed this verdict

Tool makers can add a live ShipOrSkip badge to their site. Badge loads track impressions; clicks route back to this review.

Ship · 7.5/10
HTML badge
<a href="https://shiporskip.io/api/badge-click/bonsai-8b-prismml-1bit-llm-iphone-1-15gb-apache-edge-ai-2026" target="_blank" rel="noopener"><img src="https://shiporskip.io/api/badge/bonsai-8b-prismml-1bit-llm-iphone-1-15gb-apache-edge-ai-2026" alt="Bonsai-8B Ship verdict on ShipOrSkip" width="360" height="90" /></a>
Markdown badge
[![Bonsai-8B Ship verdict on ShipOrSkip](https://shiporskip.io/api/badge/bonsai-8b-prismml-1bit-llm-iphone-1-15gb-apache-edge-ai-2026)](https://shiporskip.io/api/badge-click/bonsai-8b-prismml-1bit-llm-iphone-1-15gb-apache-edge-ai-2026)
Iframe widget
<iframe src="https://shiporskip.io/embed/bonsai-8b-prismml-1bit-llm-iphone-1-15gb-apache-edge-ai-2026" title="Bonsai-8B ShipOrSkip verdict" width="360" height="260" style="border:0;border-radius:16px;max-width:100%;" loading="lazy"></iframe>

The reviews

131 tokens/sec on M4 Pro at 1.15 GB is genuinely impressive — I can embed this in a macOS app without any cloud dependency, no rate limits, no privacy concerns. The Apache 2.0 license means I can ship commercial products on top of it. This is the edge AI story I've been waiting for.

Helpful?

63.8 on MMLU is respectable but it's still noticeably behind mid-range cloud models on reasoning tasks. The GSM8K score of 54.2 means it'll fumble multi-step math that users expect to just work. Until 1-bit gets to 70B scale, it's a neat demo that falls short in production use cases where quality matters.

Helpful?

The trajectory here is what matters: 1-bit models are getting faster to train and competitive faster than expected. When custom Apple Neural Engine kernels land for BitNet-style weights, we'll see 200+ tokens/sec on a phone. Bonsai-8B is the proof-of-concept that makes that future feel real.

Helpful?

I've been looking for something I can embed in a creative writing or brainstorming app that doesn't require an internet connection. At 44 tokens/sec on iPhone, Bonsai-8B is finally fast enough to not break the creative flow. The 'no account required' angle is a genuine selling point for privacy-conscious users.

Helpful?

Bookmarks

Loading bookmarks...

No bookmarks yet

Bookmark tools to save them for later