Question 1

Which is better: LFM2.5-VL or Ternary Bonsai?

Accepted Answer

Based on our expert panel, LFM2.5-VL has a stronger verdict with a 75% Ship rate. LFM2.5-VL received a panel verdict of Ship and Ternary Bonsai received Ship.

Question 2

Is LFM2.5-VL free?

Accepted Answer

LFM2.5-VL pricing: Open Weights

Question 3

Is Ternary Bonsai free?

Accepted Answer

Ternary Bonsai pricing: Open Source

Question 4

What do experts say about LFM2.5-VL vs Ternary Bonsai?

Accepted Answer

LFM2.5-VL: Liquid AI just shipped LFM2.5-VL, a 450M-parameter vision-language model engineered from the ground up for edge deployment. Unlike most VLMs that require a beefy GPU in the cloud, LFM2.5-VL targets devices like the Snapdragon 8 Elite, NVIDIA Jetson Orin, and AMD Ryzen AI — hitting sub-250ms latency on-device without any cloud round-trip.

This model builds significantly on its predecessor with four new capabilities: bounding box prediction (81.28 on RefCOCO-M), multilingual support across 8 languages, function calling, and improved instruction following. Those aren't just benchmark checkboxes — bounding box prediction means you can run visual grounding and object detection pipelines on a phone or robot without any server involvement.

Liquid AI is the MIT-spun startup behind Liquid Foundation Models (LFMs), a non-Transformer architecture that delivers competitive performance at a fraction of the memory footprint. LFM2.5-VL is available free on HuggingFace and through Liquid's LEAP inference platform. For builders targeting on-device AI — robotics, mobile, embedded — this is one of the most practical releases of the month. Ternary Bonsai: PrismML's Ternary Bonsai is a family of ultra-compressed language models using 1.58-bit weights — meaning every parameter is stored as -1, 0, or +1, with no higher-precision layers anywhere in the architecture. The line-up covers 8B, 4B, and 1.7B parameter models. The flagship 8B model fits in 1.75 GB of RAM, a 9x reduction versus a 16-bit baseline.

Unlike earlier 1-bit experiments that felt like a party trick with serious capability regressions, Ternary Bonsai 8B outperforms PrismML's own prior 1-bit Bonsai 8B by 5 points on average across standard benchmarks. The team also ships WebGPU inference, so the 1.7B model runs entirely in a browser tab. This is the first time a production-quality chat model has run with no server at all.

The real-world use case is edge and offline deployment: medical devices, air-gapped government systems, consumer apps that need to work without a signal. At 1.75 GB, the 8B model fits on the GPU RAM of a six-year-old gaming laptop. PrismML is positioning this as the foundation for truly offline AI — a credible claim if the capability benchmarks hold up under real-world testing.

LFM2.5-VL vs Ternary Bonsai

LFM2.5-VL

Ternary Bonsai

Bookmarks