Question 1

Which is better: Tencent Hy3-preview or VoxCPM2?

Accepted Answer

Based on our expert panel, Tencent Hy3-preview has a stronger verdict with a 75% Ship rate. Tencent Hy3-preview received a panel verdict of Ship and VoxCPM2 received Ship.

Question 2

Is Tencent Hy3-preview free?

Accepted Answer

Tencent Hy3-preview pricing: Open Source (free on HuggingFace, free tier on OpenRouter)

Question 3

Is VoxCPM2 free?

Accepted Answer

VoxCPM2 pricing: Free / Open Source

Question 4

What do experts say about Tencent Hy3-preview vs VoxCPM2?

Accepted Answer

Tencent Hy3-preview: Tencent's Hy3-preview is the company's first public frontier-class language model, released April 23 as open weights on Hugging Face. The model is a 295B parameter Mixture-of-Experts architecture with only 21B parameters active per token — keeping inference costs comparable to much smaller dense models while reaching capabilities that compete with leading proprietary systems.

The release comes under new leadership: Yao Shunyu, a former OpenAI researcher, joined Tencent in early 2026 to build out its frontier AI effort. The team claims to have gone from project start to public release in under three months — an unusually fast timeline for a model of this scale. The 256K context window and strong performance on agentic and coding benchmarks position it directly against GLM-5.1 and Qwen3.6 in the open-source frontier race.

Free inference is available on OpenRouter's free tier at launch, with the model also appearing on Hugging Face's Inference API. The architecture uses 192 routed experts in a hybrid dense-MoE configuration. For teams needing a capable open-weights model for agentic workflows without paying proprietary API rates, Hy3-preview arrives as a credible option at a remarkable cost-to-capability ratio. VoxCPM2: VoxCPM2 is a 2-billion-parameter text-to-speech model from OpenBMB that scraps discrete tokenization entirely, working directly in continuous latent space via a diffusion autoregressive architecture. Unlike dominant TTS approaches (VALL-E, Tortoise, XTTS), it never converts audio to discrete tokens — diffusion handles the full generation pipeline, resulting in 48kHz studio-quality output.

It supports 30 languages without requiring language tags, zero-shot voice cloning from reference audio, and — most distinctly — voice design from pure natural-language descriptions. You can prompt "a warm, slightly raspy woman in her 40s who sounds like a news anchor" and get a consistent new voice without providing any reference audio. Trained on 2M+ hours of multilingual data.

Released under Apache 2.0, making it commercially usable. The architecture diverges meaningfully from existing open-source TTS options and introduces a novel UX primitive (describe a voice, get a voice) that could reshape how developers approach voice synthesis in products.

Tencent Hy3-preview vs VoxCPM2

Tencent Hy3-preview

VoxCPM2

Bookmarks