T

Ternary Bonsai

1.58-bit LLMs that fit in 1.75 GB — runs in your browser via WebGPU

PriceOpen SourceReviewed2026-04-17

Expert verdict

Ship

3-1
3 Ships1 Skips
Visit prismml.com

The Panel's Take

PrismML's Ternary Bonsai is a family of ultra-compressed language models using 1.58-bit weights — meaning every parameter is stored as -1, 0, or +1, with no higher-precision layers anywhere in the architecture. The line-up covers 8B, 4B, and 1.7B parameter models. The flagship 8B model fits in 1.75 GB of RAM, a 9x reduction versus a 16-bit baseline. Unlike earlier 1-bit experiments that felt like a party trick with serious capability regressions, Ternary Bonsai 8B outperforms PrismML's own prior 1-bit Bonsai 8B by 5 points on average across standard benchmarks. The team also ships WebGPU inference, so the 1.7B model runs entirely in a browser tab. This is the first time a production-quality chat model has run with no server at all. The real-world use case is edge and offline deployment: medical devices, air-gapped government systems, consumer apps that need to work without a signal. At 1.75 GB, the 8B model fits on the GPU RAM of a six-year-old gaming laptop. PrismML is positioning this as the foundation for truly offline AI — a credible claim if the capability benchmarks hold up under real-world testing.

Share this verdict

Ternary Bonsai verdict: SHIP 🚀

3 ships · 1 skip from the expert panel

Full review: shiporskip.io/tool/ternary-bonsai-158bit-quantized-llm-8b-webgpu-prismml-2026

Weekly AI Tool Verdicts

Get the next verdict in your inbox

7 critics review a new AI tool every day. Weekly digest — free.

Looking for Ternary Bonsai alternatives?

Compare Ternary Bonsai with every other AI Models tool reviewed by our panel.

See all AI Models alternatives

Embed this verdict

Tool makers can add a live ShipOrSkip badge to their site. Badge loads track impressions; clicks route back to this review.

Ship · 7.5/10
HTML badge
<a href="https://shiporskip.io/api/badge-click/ternary-bonsai-158bit-quantized-llm-8b-webgpu-prismml-2026" target="_blank" rel="noopener"><img src="https://shiporskip.io/api/badge/ternary-bonsai-158bit-quantized-llm-8b-webgpu-prismml-2026" alt="Ternary Bonsai Ship verdict on ShipOrSkip" width="360" height="90" /></a>
Markdown badge
[![Ternary Bonsai Ship verdict on ShipOrSkip](https://shiporskip.io/api/badge/ternary-bonsai-158bit-quantized-llm-8b-webgpu-prismml-2026)](https://shiporskip.io/api/badge-click/ternary-bonsai-158bit-quantized-llm-8b-webgpu-prismml-2026)
Iframe widget
<iframe src="https://shiporskip.io/embed/ternary-bonsai-158bit-quantized-llm-8b-webgpu-prismml-2026" title="Ternary Bonsai ShipOrSkip verdict" width="360" height="260" style="border:0;border-radius:16px;max-width:100%;" loading="lazy"></iframe>

The reviews

1.75 GB for an 8B model is a genuine engineering achievement. I can finally ship a capable model inside a desktop Electron app without requiring users to have a dedicated GPU. The WebGPU demo loads fast and output quality is surprisingly coherent for its size.

Helpful?

Benchmarks are one thing; real task performance is another. A 9x memory saving typically comes with a 15-30% quality drop on anything beyond simple Q&A. And 'scores 5 points higher than our previous 1-bit model' is a low bar when the previous model wasn't competitive with 4-bit quants.

Helpful?

Browser-native LLMs with no server change the entire privacy calculus. If this scales to 13B+ parameter territory at comparable compression ratios, every personal AI assistant can run offline on consumer hardware. That's a trajectory worth tracking closely.

Helpful?

WebGPU inference means I can build offline creative tools — grammar checkers, caption writers, image prompt expanders — without an API key or monthly cost. The 1.7B model is small enough to embed in a browser extension with manageable download size.

Helpful?

Bookmarks

Loading bookmarks...

No bookmarks yet

Bookmark tools to save them for later