AWS Bedrock Launches Real-Time Voice API with Sub-400ms Latency

Amazon Web Services has added a real-time voice API to Amazon Bedrock, enabling enterprises to wire foundation models directly into telephony and WebRTC audio streams with sub-400ms latency. Anthropic Claude and Amazon Nova are the supported models at launch.

Original source

Amazon Web Services has expanded Amazon Bedrock with a real-time voice API that connects foundation models to live audio streams, targeting enterprise telephony systems and WebRTC-based applications. The API is designed to handle the full duplex audio pipeline — ingesting streaming audio, running inference, and returning synthesized speech — with a claimed end-to-end latency under 400 milliseconds.

At launch, the API supports two model families: Anthropic Claude and Amazon Nova. This positions the feature as a managed alternative to stitching together separate speech-to-text, LLM, and text-to-speech services, which has been the dominant architecture for voice AI deployments to date. By collapsing that pipeline into a single Bedrock endpoint, AWS is betting that enterprises will trade some architectural flexibility for reduced integration complexity and a single billing surface.

The telephony and WebRTC support is notable because it targets existing enterprise communication infrastructure rather than requiring new client applications. Contact center platforms, IVR systems, and browser-based communication tools are the obvious immediate use cases. AWS has not published detailed pricing for the voice API tier at this time, and the latency figure has not been accompanied by a published methodology or test conditions.

The launch reflects a broader competitive dynamic in which cloud providers are racing to offer managed real-time AI audio infrastructure, reducing the need for customers to assemble bespoke stacks from specialist vendors. Whether the 400ms latency figure holds under production load and across geographies will determine how seriously enterprise buyers evaluate it against purpose-built alternatives.

Panel Takes

The Builder

Developer Perspective

“The primitive here is a managed full-duplex audio-to-LLM-to-audio pipeline exposed as a single API endpoint — that's actually a meaningful abstraction over the three-service stack (Whisper or Deepgram, your LLM, Polly or ElevenLabs) most people are currently duct-taping together. The DX bet is that you give up per-component tunability in exchange for not writing your own streaming orchestration layer, and for a lot of production voice apps, that's the right trade. What I need before shipping: actual SDK examples that show how backpressure and interruption handling work, because that's where every voice pipeline dies in production, not the happy path.”

The Skeptic

Reality Check

“The sub-400ms claim is doing a lot of work here, and AWS has published zero methodology — no region, no audio length, no concurrency level, no model size. Direct competitors like Daily.co with Pipecat or Twilio's Voice Intelligence have been running production voice AI pipelines for real enterprise customers and can point at actual call center deployments. What kills this in 12 months is not a competitor — it's that the 400ms number turns into 700ms at scale in us-east-1 on a Tuesday afternoon, the enterprise pilots stall, and AWS quietly repositions this as a beta feature while the specialist vendors own the segment that actually cares about latency.”

The Founder

Business & Market

“The buyer is the enterprise contact center or UCaaS platform team, and the budget comes from whatever line item currently pays for their Nuance or Genesys AI integration — that's a real, large, and actively unhappy buyer pool. AWS's moat here isn't the voice API itself; it's that enterprises already running workloads on Bedrock get to consolidate vendors and simplify their security review, which is a genuine switching cost baked into procurement, not product. The risk is pricing: if AWS charges per-minute at rates that make sense for low-volume pilots but hurt at contact-center scale, they'll lose the exact buyers this feature was built for.”

The Futurist

Big Picture

“The thesis this bets on is specific and falsifiable: within three years, the dominant architecture for enterprise voice AI is a single managed cloud endpoint rather than an assembled stack of specialist services, and whoever owns that endpoint owns the conversation data and the model fine-tuning flywheel that follows. The second-order effect that matters most isn't faster IVR — it's that collapsing the pipeline into Bedrock means AWS gets access to enterprise audio data at a scale no specialist vendor can match, which compounds into better Nova models tuned on real call center speech. The dependency that has to hold: WebRTC and SIP remain the transport layer for enterprise voice long enough for AWS to establish this as the default integration point before some other abstraction displaces them both.”

Panel Takes

Bookmarks