Question 1

Which is better: Qwen3.5-Omni or Ternary Bonsai?

Accepted Answer

Based on our expert panel, Qwen3.5-Omni has a stronger verdict with a 75% Ship rate. Qwen3.5-Omni received a panel verdict of Ship and Ternary Bonsai received Ship.

Question 2

Is Qwen3.5-Omni free?

Accepted Answer

Qwen3.5-Omni pricing: Proprietary / API (Alibaba Cloud)

Question 3

Is Ternary Bonsai free?

Accepted Answer

Ternary Bonsai pricing: Open Source / Apache 2.0 / Free

Question 4

What do experts say about Qwen3.5-Omni vs Ternary Bonsai?

Accepted Answer

Qwen3.5-Omni: Qwen3.5-Omni is Alibaba's most advanced multimodal model yet — a native Thinker-Talker architecture that processes and generates text, audio, and video in a single unified system. Released in three variants (Plus, Flash, Light), it supports a 256k context window, 10+ hours of audio, and 400 seconds of 720p video at 1 FPS, with speech recognition across 113 languages and dialects.

The headline capability is what Alibaba is calling "Audio-Visual Vibe Coding" — an emergent behavior where the model writes functional code based solely on watching a video and listening to spoken instructions. In demos, it takes a hand-drawn sketch held up to a camera and converts it into a working React webpage in real time. This wasn't an explicitly trained capability; it emerged from the model's unified multimodal architecture.

The model uses semantic interruption and turn-taking intent recognition for real-time interaction, and TMRoPE for temporal multimodal position encoding. The catch: Alibaba broke from its open-source streak and kept Qwen3.5-Omni proprietary, accessible only through their chatbot interface and Alibaba Cloud. The open-source community has noticed — and is not pleased. Ternary Bonsai: PrismML's Ternary Bonsai is a family of aggressively quantized language models that take the BitNet concept to its logical extreme. Each weight is constrained to one of three values — {-1, 0, +1} — with a shared FP16 scale factor per 128-weight group. No higher-precision escape hatches, no hybrid layers. The result is a 9x reduction in memory footprint versus standard 16-bit models.

The numbers are striking: the 8B model fits in 1.75 GB and hits 82 tokens per second on an M4 Pro. More impressively, it runs at 27 tokens per second on an iPhone 17 Pro Max — fast enough for real-time conversation on-device. The 8B variant scores 75.5 average across standard benchmarks, outperforming many models that are 9-10x larger. The 4B and 1.7B variants push further into mobile-optimized territory.

All three models are released under the Apache 2.0 license, available on Hugging Face and GitHub, and integrated into the Locally AI iOS app for immediate on-device deployment. For developers building privacy-sensitive applications or anyone tired of paying cloud inference costs, Ternary Bonsai offers a compelling on-device alternative that doesn't require a beefy GPU.

Qwen3.5-Omni vs Ternary Bonsai

Qwen3.5-Omni

Ternary Bonsai

Bookmarks