Ternary Bonsai
1.58-bit LLMs that fit in 1.75 GB — runs in your browser via WebGPU
Expert verdict
Ship
3-1The Panel's Take
PrismML's Ternary Bonsai is a family of ultra-compressed language models using 1.58-bit weights — meaning every parameter is stored as -1, 0, or +1, with no higher-precision layers anywhere in the architecture. The line-up covers 8B, 4B, and 1.7B parameter models. The flagship 8B model fits in 1.75 GB of RAM, a 9x reduction versus a 16-bit baseline. Unlike earlier 1-bit experiments that felt like a party trick with serious capability regressions, Ternary Bonsai 8B outperforms PrismML's own prior 1-bit Bonsai 8B by 5 points on average across standard benchmarks. The team also ships WebGPU inference, so the 1.7B model runs entirely in a browser tab. This is the first time a production-quality chat model has run with no server at all. The real-world use case is edge and offline deployment: medical devices, air-gapped government systems, consumer apps that need to work without a signal. At 1.75 GB, the 8B model fits on the GPU RAM of a six-year-old gaming laptop. PrismML is positioning this as the foundation for truly offline AI — a credible claim if the capability benchmarks hold up under real-world testing.
Share this verdict
Ternary Bonsai verdict: SHIP 🚀 3 ships · 1 skip from the expert panel Full review: shiporskip.io/tool/ternary-bonsai-158bit-quantized-llm-8b-webgpu-prismml-2026
Weekly AI Tool Verdicts
Get the next verdict in your inbox
7 critics review a new AI tool every day. Weekly digest — free.
Similar Products
Compare Ternary Bonsai with Others
Looking for Ternary Bonsai alternatives?
Compare Ternary Bonsai with every other AI Models tool reviewed by our panel.
See all AI Models alternativesEmbed this verdict
Tool makers can add a live ShipOrSkip badge to their site. Badge loads track impressions; clicks route back to this review.
<a href="https://shiporskip.io/api/badge-click/ternary-bonsai-158bit-quantized-llm-8b-webgpu-prismml-2026" target="_blank" rel="noopener"><img src="https://shiporskip.io/api/badge/ternary-bonsai-158bit-quantized-llm-8b-webgpu-prismml-2026" alt="Ternary Bonsai Ship verdict on ShipOrSkip" width="360" height="90" /></a>[](https://shiporskip.io/api/badge-click/ternary-bonsai-158bit-quantized-llm-8b-webgpu-prismml-2026)<iframe src="https://shiporskip.io/embed/ternary-bonsai-158bit-quantized-llm-8b-webgpu-prismml-2026" title="Ternary Bonsai ShipOrSkip verdict" width="360" height="260" style="border:0;border-radius:16px;max-width:100%;" loading="lazy"></iframe>The reviews
“1.75 GB for an 8B model is a genuine engineering achievement. I can finally ship a capable model inside a desktop Electron app without requiring users to have a dedicated GPU. The WebGPU demo loads fast and output quality is surprisingly coherent for its size.”
“Benchmarks are one thing; real task performance is another. A 9x memory saving typically comes with a 15-30% quality drop on anything beyond simple Q&A. And 'scores 5 points higher than our previous 1-bit model' is a low bar when the previous model wasn't competitive with 4-bit quants.”
“Browser-native LLMs with no server change the entire privacy calculus. If this scales to 13B+ parameter territory at comparable compression ratios, every personal AI assistant can run offline on consumer hardware. That's a trajectory worth tracking closely.”
“WebGPU inference means I can build offline creative tools — grammar checkers, caption writers, image prompt expanders — without an API key or monthly cost. The 1.7B model is small enough to embed in a browser extension with manageable download size.”