Parlor
Real-time voice + vision AI that runs 100% on your local machine
Expert verdict
Ship
3-1The Panel's Take
Parlor is an open-source Python/FastAPI app that gives you a fully local, real-time multimodal AI assistant — you speak to it and show it your camera, and it responds with synthesized voice, all on-device. It uses Gemma 4 for vision and language understanding and Kokoro for text-to-speech, delivering end-to-end latency of around 2.5-3 seconds on an Apple M3 Pro without touching any cloud API. What makes Parlor stand out is barge-in support — you can interrupt the AI mid-sentence, just like a real conversation — and cross-platform inference: MLX on macOS for GPU acceleration, ONNX on Linux. The creator benchmarked 83 tokens/second on an M3 Pro and provided reproducible setup instructions in under ten lines of shell. It surfaced on Hacker News as a 'Show HN' post and quickly accumulated over 50 upvotes, with developers praising the honest latency numbers and the fact that the entire stack — from audio capture to TTS playback — is open-sourceable and self-hostable with no API key required.
Share this verdict
Parlor verdict: SHIP 🚀 3 ships · 1 skip from the expert panel Full review: shiporskip.io/tool/parlor-on-device-realtime-voice-vision-ai-local-gemma
Weekly AI Tool Verdicts
Get the next verdict in your inbox
7 critics review a new AI tool every day. Weekly digest — free.
Embed this verdict
Tool makers can add a live ShipOrSkip badge to their site. Badge loads track impressions; clicks route back to this review.
<a href="https://shiporskip.io/api/badge-click/parlor-on-device-realtime-voice-vision-ai-local-gemma" target="_blank" rel="noopener"><img src="https://shiporskip.io/api/badge/parlor-on-device-realtime-voice-vision-ai-local-gemma" alt="Parlor Ship verdict on ShipOrSkip" width="360" height="90" /></a>[](https://shiporskip.io/api/badge-click/parlor-on-device-realtime-voice-vision-ai-local-gemma)<iframe src="https://shiporskip.io/embed/parlor-on-device-realtime-voice-vision-ai-local-gemma" title="Parlor ShipOrSkip verdict" width="360" height="260" style="border:0;border-radius:16px;max-width:100%;" loading="lazy"></iframe>The reviews
“Finally a local voice+vision stack that actually benchmarks its own latency instead of hiding behind vague demos. The MLX path on Apple Silicon is fast, barge-in works, and the codebase is small enough to fork and own. This is the foundation I'd build a personal assistant on.”
“2.5-3 second latency is fine for demos but painfully slow for natural conversation — real barge-in at that speed still feels robotic. And Gemma 4 as the vision model is a step behind GPT-4V or Claude in accuracy. Until latency drops to sub-second, this is a weekend project, not a daily driver.”
“The local-first AI assistant with eyes and ears is the endgame for ambient computing. Parlor is the earliest working prototype of a future where your laptop has a persistent, private AI companion that sees what you see. Get familiar with this architecture now — it will be mainstream in 18 months.”
“Being able to point my camera at a draft design and ask what's wrong with this layout while talking out loud — all offline — is genuinely useful. The voice output quality from Kokoro is surprisingly good. I'd use this during creative sessions where I don't want to type.”