๐Ÿ’ฌ Customer Support Automation

What Is AI Deflection (and How to Measure It)

"Deflection" is the most-cited and most-misunderstood metric in AI customer support. Vendors quote 80% deflection rates. Buyers don't always know what that means or how to verify it.

Tyler Weitzman
Tyler Weitzman
January 26, 2026 ยท 5 min read
Speechify

"Deflection" is the most-cited and most-misunderstood metric in AI customer support. Vendors quote 80% deflection rates. Buyers don't always know what that means or how to verify it. Let's clear up what counts as deflection, what it doesn't, and what you should actually measure instead.

TL;DR

  • Deflection is the percentage of contacts that the AI handled without escalating to a human.
  • Raw deflection is misleading because it doesn't account for whether the customer's issue was actually resolved.
  • The better metric: containment rate (resolved without human + customer didn't call back about the same issue).
  • A 70% containment rate is excellent. 40% is fine. 20% is broken.

Deflection vs containment vs resolution

Three related concepts often confused:

Deflection. The AI handled the contact; no human was involved.

Containment. Same as deflection PLUS the customer didn't return for the same issue within a defined window (usually 7โ€“14 days).

Resolution. The customer's underlying issue was actually fixed (which sometimes requires multiple interactions across different channels).

Deflection is easy to measure. Containment is harder. Resolution is hardest.

Why raw deflection is misleading

Imagine an AI agent that always says "we're experiencing high call volume โ€” please try again later" and hangs up. 100% deflection rate. 0% useful.

A more realistic version: an AI that tries to handle every call but can't resolve 40% of them. Those callers either:

  • Hang up frustrated (counts as "deflected")
  • Call back within an hour (also "deflected" if you count each call separately)
  • Email or chat about the same issue (you wouldn't see this in voice metrics at all)

You think you're at 80% deflection; you're really at 40% containment.

How to compute containment honestly

The math:

Containment rate = (Calls handled by AI that didn't return)
                 / (Total calls handled by AI)

"Didn't return" means: the same caller (matched on phone or account ID) didn't have another contact about a similar topic within N days.

Practical implementation:

  1. Tag each call with caller ID and intent.
  2. After 14 days, mark the call "contained" if no follow-up call from the same caller about the same intent.
  3. Track containment rate as a rolling metric.

This catches the cases where AI "handled" the call but didn't actually solve the problem.

The intent-tagging problem

Computing containment requires knowing what the call was about. Most platforms support intent tagging:

  • The agent self-reports the intent ("this call was about order status").
  • An LLM categorizes after the fact.
  • A human reviewer tags a sample.

Whichever you use, the tagging must be consistent across calls or your follow-up matching won't work.

What's a good containment rate

Approximate benchmarks:

  • 70%+: excellent. Your AI is genuinely solving problems.
  • 50โ€“70%: solid. Most teams land here after iteration.
  • 30โ€“50%: working but with room. Investigate failed cases.
  • Under 30%: something's broken. Audit your prompt and escalation logic.

When deflection is the right metric

Sometimes deflection IS what you care about. Examples:

  • After-hours coverage where the alternative is voicemail. Any handled call is a win even if only partial.
  • High-volume top-of-funnel triage where you just need to route, not resolve.
  • Cost-pressured contexts where reducing human-handled volume is the goal regardless of resolution quality.

For these, raw deflection is fine. For mature support agents, containment is the better metric.

What to do with low deflection

If your deflection is lower than expected:

Audit the escalation criteria. Maybe the AI is escalating too easily.

Review escalated calls. What patterns do they share? Often a single missing capability accounts for many escalations.

Expand the knowledge base. Often the agent escalates because it doesn't have the answer.

Add new functions. Sometimes the agent escalates because it can't do what's needed.

What to do with low containment

If deflection is high but containment is low:

Listen to the deflected calls. What did the AI say? Did the customer seem satisfied or did they hang up frustrated?

Survey returning callers. "We see you called twice this week โ€” what could we have done better?"

Tighten resolution criteria. Maybe the AI is marking calls as "resolved" prematurely.

Improve handoff to other channels. A call that ends with "I'll email you" should track whether the email actually solved it.

A worked example

100,000 monthly calls. AI handles 80,000 of them (others escalated immediately for various reasons).

Of those 80,000:

  • 50,000 callers don't return โ†’ contained.
  • 15,000 callers return within 14 days about the same issue โ†’ not contained.
  • 15,000 we can't tell (no caller ID, intent unclear) โ†’ exclude.

Containment rate: 50,000 / 65,000 = 77%.

Deflection rate: 80,000 / 100,000 = 80%.

Most vendors would report the 80%. The honest metric is 77%.

Cost tied to containment

The economic value: each contained call saves ~one human agent-handle's worth of cost. The non-contained ones don't.

So a 77% containment at 80,000 calls = 50,000 calls' worth of human cost savings. That's the actual ROI.

For more on cost math, see how to calculate ROI for AI customer support.

FAQ

What's a good follow-up window? 14 days for general support. 7 days for time-sensitive issues. 30 days for less urgent contexts.

Can I count cross-channel returns? Ideally yes โ€” if the customer who called also chatted later about the same issue, that's not contained. Requires unified customer view.

What about partial resolutions? Some teams use a 3-tier scale: contained, partial, escalated. More nuanced but harder to track consistently.

Should I report deflection or containment to my CFO? Containment. Tie it to cost savings. Easier to defend in budget conversations.

What if my containment metric is suspicious? Spot-check 50 "contained" calls manually. Listen to the audio. Were they actually resolved? Calibrate from there.

Tyler Weitzman
Tyler Weitzman
Co-Founder & Head of AI, Speechify

Tyler Weitzman is co-founder and Head of AI at Speechify. He has spent the past decade building the speech-synthesis stack that powers millions of users. Tyler writes about the engineering of real-time conversational systems โ€” text-to-speech, speech recognition, latency budgets, model serving, and the architectural choices that separate prototypes from production-grade voice agents.

More from Tyler Weitzman

View all โ†’

Related reading

Voice AI, twice a month.

Get the best of the SIMBA resources hub โ€” new articles, trend notes, and operator guides. No spam.