Compare/Ghost Pepper vs VoxCPM2

AI tool comparison

Ghost Pepper vs VoxCPM2

Which one should you ship with? Here is the side-by-side panel verdict, pricing read, reviewer split, and community vote comparison.

G

Voice & Dictation

Ghost Pepper

Hold Control. Speak. Release. It types for you — all on-device.

Ship

75%

Panel ship

Community

Free

Entry

Ghost Pepper is a macOS hold-to-talk dictation app that runs entirely on-device using Apple's WhisperKit for speech recognition and LLM.swift for smart cleanup. You hold the Control key to record, release to transcribe, and the transcribed text is automatically pasted into whatever app you're using. No cloud, no subscription, no data ever leaves your Mac. The "smart cleanup" feature is what sets it apart from basic Whisper wrappers: it uses a local language model to remove filler words, fix self-corrections in real time, and clean up stutters without altering your intent. Version 2.0.1, released April 6, brings improved accuracy and lower latency on Apple Silicon. It requires macOS 14+ and an Apple Silicon chip. Ghost Pepper hit the top of Hacker News' Show HN section on April 7 with 354 points and 164 comments — an unusually strong signal for a solo-dev open-source tool. The timing is notable: as commercial dictation tools like Wispr Flow move to paid-only models, Ghost Pepper offers a fully free, auditable alternative. It's MIT-licensed and available on GitHub.

V

Audio & Voice

VoxCPM2

Tokenizer-free TTS: voice design, cloning, and 30 languages from 2B params

Ship

75%

Panel ship

Community

Paid

Entry

VoxCPM2 is an open-source text-to-speech system from OpenBMB that takes a fundamentally different architectural approach to speech synthesis. Instead of the discrete tokenization pipeline used by most modern TTS systems, VoxCPM2 operates entirely in latent space through a diffusion autoregressive pipeline — bypassing tokenization altogether. The 2B-parameter model was trained on over 2 million hours of multilingual speech and supports 30 languages plus 9 Chinese dialects with no language tagging needed. What makes VoxCPM2 stand out is its three-mode voice control system. "Voice Design" lets you create entirely new voices from natural language descriptions alone — "young woman, gentle voice, slightly husky" — no reference audio required. "Controllable Voice Cloning" takes a reference clip and lets you adjust style and emotion. "Ultimate Cloning" provides maximum fidelity by supplying both the reference audio and its transcript. Output quality is 48kHz studio-grade audio, and the model runs at RTF ~0.3 on an RTX 4090 (or ~0.13 with Nano-vLLM acceleration). The Apache 2.0 license makes VoxCPM2 commercially viable for builders who've been held back by restrictive TTS licensing. It benchmarks competitively with commercial models on Seed-TTS-eval across English and Mandarin. The Hugging Face demo is live, weights are published, and it installs via `pip install voxcpm`. For any developer building voice products, this is worth evaluating immediately.

Decision
Ghost Pepper
VoxCPM2
Panel verdict
Ship · 3 ship / 1 skip
Ship · 3 ship / 1 skip
Community
No community votes yet
No community votes yet
Pricing
Free / Open Source (MIT)
Open Source
Best for
Hold Control. Speak. Release. It types for you — all on-device.
Tokenizer-free TTS: voice design, cloning, and 30 languages from 2B params
Category
Voice & Dictation
Audio & Voice

Reviewer scorecard

Builder
80/100 · ship

This is the dictation tool I've been waiting for. On-device, zero latency once warmed up, MIT license, and the LLM cleanup actually works. I replaced Wispr Flow with this in under 5 minutes. The Control-hold UX is more ergonomic than I expected.

80/100 · ship

Apache 2.0 + pip install + 48kHz output is the holy grail for voice product builders. Most open TTS models either sound robotic, have restrictive licenses, or require complex setup. VoxCPM2 clears all three bars. The voice design feature alone changes how you prototype voice UX — describe the persona instead of recording it.

Skeptic
45/100 · skip

Apple Silicon only and macOS 14+ means a significant portion of Mac users are locked out. The 'smart cleanup' LLM adds another model to memory — not ideal if you're already running other local models. Also, no GUI means non-technical users won't touch it.

45/100 · skip

RTF of 0.3 on an RTX 4090 means real-time generation requires serious hardware — most small builders can't run this locally at scale. The technical report isn't published yet, so the benchmark claims are harder to independently verify. And 30 languages sounds impressive until you check whether your target dialect is actually well-represented in those 2M training hours.

Futurist
80/100 · ship

Ghost Pepper is a preview of how computing will feel in 5 years: ambient voice input everywhere, zero latency, zero cloud dependency. The fact that a solo dev shipped this in Swift using WhisperKit and LLM.swift is a testament to how capable the Apple Neural Engine stack has become.

80/100 · ship

The shift away from discrete tokenization in TTS is architecturally significant — it mirrors the same trajectory that diffusion models took in image generation, and look how that ended. VoxCPM2 is an early signal that the tokenize-everything paradigm in audio is starting to crack. The end state is real-time, hyper-expressive voice synthesis running on consumer hardware.

Creator
80/100 · ship

I tried it during a writing session and the filler-word removal alone is worth it — my raw dictation comes out cleaner than when I type. The hold-to-talk model also means I'm never accidentally recording. Solid privacy story for journaling and creative work.

80/100 · ship

Designing voices with natural language instead of recording sessions is a genuine workflow unlock for content creators and game developers. The ability to describe 'tired, slightly gruff narrator in his 50s' and get consistent output is something I've wanted for years. The 48kHz output quality means it's usable in professional audio contexts without upsampling.

Weekly AI Tool Verdicts

Get the next comparison in your inbox

New AI tools ship daily. We compare them before you waste an afternoon.

Bookmarks

Loading bookmarks...

No bookmarks yet

Bookmark tools to save them for later