Voice AI is in an awkward stage. The capabilities that worked in demos a year ago are now table stakes; the things that used to fail still fail in roughly the same ways. The market hype has run ahead of what's deployable. The honest field guide for what's actually doable and what isn't is less exciting than the LinkedIn version.

TL;DR

Bounded, transactional voice tasks (booking, status, password resets) work reliably.
Open-ended emotional or judgment-heavy conversations remain hard.
Numbers, names, and unusual vocabulary are still a notable failure mode.
Multilingual is good but unevenly so — English/Spanish are great; lower-resource languages need testing.
Latency, escalation, and operations are where most teams fail, not core AI capability.

What works well

If your use case lives in this list, voice AI is probably ready for production:

Booking and rescheduling. Asking for a date, checking availability, confirming. The flows are bounded and the model can be very explicit about confirming details ("just to confirm, that's Tuesday the 15th at 3 PM — does that work?").

Order status and basic account questions. "Where's my order?" "Has my payment been processed?" These are well-structured tool-calls with simple natural language wrappers.

Password resets and account verification. With proper SMS-based verification or PIN-back, these are routine.

Tier-1 support tickets. The 60–80% of inbound that follow a known pattern. Not the long-tail edge cases.

Outbound qualification calls. Following a script, capturing answers, scoring, booking a demo or moving to a human SDR.

After-hours coverage. Picking up calls when the office is closed. The bar is low (the alternative is voicemail) and the win is large.

What kind of works but needs care

These work for many teams but fail for some — usually due to operational rather than technical reasons:

Refunds and cancellations. The agent can do them, but you need policies for "how much can the agent approve before escalating" and you need the agent to be very clear about disclosing the policy to the customer.

Long, multi-step troubleshooting. Walking a customer through resetting their router can work, but only if you've put the steps into the knowledge base in a structured way. Improvised diagnostics struggle.

Contextual upsells. "Have you considered upgrading?" works when the agent knows the customer well; falls flat otherwise. Easy to make annoying.

Multilingual conversations. English, Spanish, French, and Portuguese are excellent. Mandarin, Japanese, Arabic are good. Lower-resource languages can be brittle. Always test on real audio in your target language.

Complex form-filling. Capturing 10 fields of information over voice is doable but tedious; the better pattern is "capture the critical ones over voice and SMS the customer a link for the rest."

What doesn't work yet

Honest list of things voice AI handles poorly in 2026:

Highly emotional contexts. Bereavement, escalated complaints, sensitive medical conversations, mental health support. Voice AI can be present, but it shouldn't be the primary respondent.

Long unstructured conversations with multiple intents. A 20-minute call that morphs from billing to features to a complaint. The agent loses track or hands off too early.

Account verification with messy data. Reading back a 10-character account number with hyphens and capital letters over voice fails too often. The fix is DTMF or a different channel.

Numbers and names with no context. Even great STT systems mis-hear "Vyas" as "Vias" or "Buy us." The fix is custom vocabularies, biased decoding, or a confirm-back step.

Real-time sensitive negotiation. Closing a high-value contract, navigating a tricky liability conversation. The judgment isn't there yet.

Anything that requires watching the customer. Reading body language, noticing they're distracted, etc. Voice is voice.

What's improving fast

The frontier in 2026 is moving in three places:

Latency. End-to-end round-trip times under 350ms are now achievable in production. A year ago that was a research demo.

Multilingual fluency. TTS quality in Hindi, Vietnamese, Arabic has gotten dramatically better in the last 12 months.

Multi-agent orchestration. A "supervisor" agent that routes turns to specialized sub-agents (a billing expert, a tech support expert) is increasingly common. This pattern handles complex multi-intent calls better than a single monolithic agent.

The single biggest predictor of success

Across many deployments, the variable that most predicts whether a voice agent project succeeds isn't the technology — it's whether the team has done the operational work:

Picked a bounded use case with clear success criteria.
Defined what escalation looks like and when it fires.
Built an evaluation harness to grade agent calls.
Set realistic expectations about handling time and resolution rate.
Allocated someone whose job includes monitoring agent quality post-launch.

Teams that do this ship voice agents successfully. Teams that don't ship things that demo well and fail in production.

For the deployment playbook, see voice agent onboarding: a 30-day plan for support teams.

FAQ

Are voice agents ready to replace my call center? For tier-1 inbound, mostly yes — with proper escalation. For complex, judgment-heavy work, no. The best deployments augment human agents rather than replacing them outright.

Can voice agents handle accents? Modern STT handles most major accents well in English. Heavy regional accents and code-switching (mixing two languages mid-sentence) are still hard.

What's the failure rate? A well-tuned voice agent should resolve 60–80% of bounded inbound calls without human handoff. Below 50% and your use case probably needs more constraint or your prompt needs work.

Can the agent handle a follow-up question? Yes — multi-turn within a session is the strong suit. Multi-session memory (remembering the caller from yesterday) is possible but requires an explicit memory layer.

Will the customer know it's an AI? Most will — modern voice agents are very good but most callers can still tell. Some teams disclose proactively ("I'm a virtual assistant"); others let the conversation speak for itself. Disclosure is required by law in some U.S. states for outbound.

What Voice Agents Can and Can't Do in 2026

TL;DR

What works well

What kind of works but needs care

What doesn't work yet

What's improving fast

The single biggest predictor of success

FAQ

More from Cliff Weitzman

Why Voice Will Be the Default UX for Enterprise AI

The Economics of AI Voice Agents at Scale

How AI Voice Will Reshape Customer Service Jobs

Related reading

First-Time Builder's Guide to Voice Agents

Why Voice AI Will Transform Phone Channels by 2030

Voice Agent Use Cases: A Field Guide

Voice AI, twice a month.