What Is AI Deflection (and How to Measure It)
"Deflection" is the most-cited and most-misunderstood metric in AI customer support. Vendors quote 80% deflection rates. Buyers don't always know what that means or how to verify it.
"Deflection" is the most-cited and most-misunderstood metric in AI customer support. Vendors quote 80% deflection rates. Buyers don't always know what that means or how to verify it. Let's clear up what counts as deflection, what it doesn't, and what you should actually measure instead.
TL;DR
- Deflection is the percentage of contacts that the AI handled without escalating to a human.
- Raw deflection is misleading because it doesn't account for whether the customer's issue was actually resolved.
- The better metric: containment rate (resolved without human + customer didn't call back about the same issue).
- A 70% containment rate is excellent. 40% is fine. 20% is broken.
Deflection vs containment vs resolution
Three related concepts often confused:
Deflection. The AI handled the contact; no human was involved.
Containment. Same as deflection PLUS the customer didn't return for the same issue within a defined window (usually 7โ14 days).
Resolution. The customer's underlying issue was actually fixed (which sometimes requires multiple interactions across different channels).
Deflection is easy to measure. Containment is harder. Resolution is hardest.
Why raw deflection is misleading
Imagine an AI agent that always says "we're experiencing high call volume โ please try again later" and hangs up. 100% deflection rate. 0% useful.
A more realistic version: an AI that tries to handle every call but can't resolve 40% of them. Those callers either:
- Hang up frustrated (counts as "deflected")
- Call back within an hour (also "deflected" if you count each call separately)
- Email or chat about the same issue (you wouldn't see this in voice metrics at all)
You think you're at 80% deflection; you're really at 40% containment.
How to compute containment honestly
The math:
Containment rate = (Calls handled by AI that didn't return)
/ (Total calls handled by AI)
"Didn't return" means: the same caller (matched on phone or account ID) didn't have another contact about a similar topic within N days.
Practical implementation:
- Tag each call with caller ID and intent.
- After 14 days, mark the call "contained" if no follow-up call from the same caller about the same intent.
- Track containment rate as a rolling metric.
This catches the cases where AI "handled" the call but didn't actually solve the problem.
The intent-tagging problem
Computing containment requires knowing what the call was about. Most platforms support intent tagging:
- The agent self-reports the intent ("this call was about order status").
- An LLM categorizes after the fact.
- A human reviewer tags a sample.
Whichever you use, the tagging must be consistent across calls or your follow-up matching won't work.
What's a good containment rate
Approximate benchmarks:
- 70%+: excellent. Your AI is genuinely solving problems.
- 50โ70%: solid. Most teams land here after iteration.
- 30โ50%: working but with room. Investigate failed cases.
- Under 30%: something's broken. Audit your prompt and escalation logic.
When deflection is the right metric
Sometimes deflection IS what you care about. Examples:
- After-hours coverage where the alternative is voicemail. Any handled call is a win even if only partial.
- High-volume top-of-funnel triage where you just need to route, not resolve.
- Cost-pressured contexts where reducing human-handled volume is the goal regardless of resolution quality.
For these, raw deflection is fine. For mature support agents, containment is the better metric.
What to do with low deflection
If your deflection is lower than expected:
Audit the escalation criteria. Maybe the AI is escalating too easily.
Review escalated calls. What patterns do they share? Often a single missing capability accounts for many escalations.
Expand the knowledge base. Often the agent escalates because it doesn't have the answer.
Add new functions. Sometimes the agent escalates because it can't do what's needed.
What to do with low containment
If deflection is high but containment is low:
Listen to the deflected calls. What did the AI say? Did the customer seem satisfied or did they hang up frustrated?
Survey returning callers. "We see you called twice this week โ what could we have done better?"
Tighten resolution criteria. Maybe the AI is marking calls as "resolved" prematurely.
Improve handoff to other channels. A call that ends with "I'll email you" should track whether the email actually solved it.
A worked example
100,000 monthly calls. AI handles 80,000 of them (others escalated immediately for various reasons).
Of those 80,000:
- 50,000 callers don't return โ contained.
- 15,000 callers return within 14 days about the same issue โ not contained.
- 15,000 we can't tell (no caller ID, intent unclear) โ exclude.
Containment rate: 50,000 / 65,000 = 77%.
Deflection rate: 80,000 / 100,000 = 80%.
Most vendors would report the 80%. The honest metric is 77%.
Cost tied to containment
The economic value: each contained call saves ~one human agent-handle's worth of cost. The non-contained ones don't.
So a 77% containment at 80,000 calls = 50,000 calls' worth of human cost savings. That's the actual ROI.
For more on cost math, see how to calculate ROI for AI customer support.
Related reading
- CSAT for AI Agents: Benchmarks and Frameworks
- How to Tag and Categorize AI Conversations
- Quality Assurance for AI Voice Support
- Cutting Average Handle Time with Voice Agents
- Why First-Contact Resolution Is the North Star for AI Support
FAQ
What's a good follow-up window? 14 days for general support. 7 days for time-sensitive issues. 30 days for less urgent contexts.
Can I count cross-channel returns? Ideally yes โ if the customer who called also chatted later about the same issue, that's not contained. Requires unified customer view.
What about partial resolutions? Some teams use a 3-tier scale: contained, partial, escalated. More nuanced but harder to track consistently.
Should I report deflection or containment to my CFO? Containment. Tie it to cost savings. Easier to defend in budget conversations.
What if my containment metric is suspicious? Spot-check 50 "contained" calls manually. Listen to the audio. Were they actually resolved? Calibrate from there.

Tyler Weitzman is co-founder and Head of AI at Speechify. He has spent the past decade building the speech-synthesis stack that powers millions of users. Tyler writes about the engineering of real-time conversational systems โ text-to-speech, speech recognition, latency budgets, model serving, and the architectural choices that separate prototypes from production-grade voice agents.
More from Tyler Weitzman
View all โOpen-Source vs Proprietary Voice Agent Stacks
The open-source voice AI stack in 2026 is genuinely good. Whisper and its derivatives handle STT. Open-weight LLMs like Llama 3/4, Qwen, Mistral handle the reasoning. Open-source TTS (XTTS, StyleTTS, Orpheus-class) handles output.
Build vs Buy: When to Build Your Own Voice Agent
Build-vs-buy for voice agents in 2026 is a different conversation than it was two years ago. Then, the open-source stack was rough and most serious deployments ended up building.
Voice Agents for Developer Support
Developer support is a strange category. Developers don't generally want to call anyone. They want Stack Overflow, they want clear docs, they want an LLM that can read their code.
Related reading
CSAT for AI Agents: Benchmarks and Frameworks
Customer Satisfaction (CSAT) is the closest thing to a north star for support agents. Tracking it for AI agents specifically โ and comparing it against human-handled equivalents โ is the single most useful operational habit for any team running customer-facing AI.
How to Calculate ROI for AI Customer Support
ROI calculations for AI customer support often use the wrong baselines and the wrong metrics. The result: numbers that look great in a deck but don't match reality once deployed. The right model captures the full cost and benefit stack, including second-order effects.
How to Tag and Categorize AI Conversations
Conversation tagging is what turns thousands of AI-handled calls into actionable insight. Every call should get tagged with intent, outcome, sentiment, and any anomalies โ automatically, consistently, and in a way that supports both real-time routing and after-the-factโฆ
Voice AI, twice a month.
Get the best of the SIMBA resources hub โ new articles, trend notes, and operator guides. No spam.
