📞 Outbound Sales & Calling

How to Coach an AI Outbound Agent Like an SDR

Human SDRs improve through coaching. You sit with them on calls, listen to recordings, mark what worked and what didn't, and iterate. AI outbound agents improve the same way — but the coaching mechanism is prompt engineering, example curation, and eval runs instead of direct…

Rohan Pavuluri
Rohan Pavuluri
February 18, 2026 · 5 min read
Speechify

Human SDRs improve through coaching. You sit with them on calls, listen to recordings, mark what worked and what didn't, and iterate. AI outbound agents improve the same way — but the coaching mechanism is prompt engineering, example curation, and eval runs instead of direct conversation. Teams that treat their AI outbound agents like junior SDRs needing ongoing coaching outperform teams that deploy and forget.

TL;DR

  • Sample AI calls weekly; score them on a rubric.
  • Turn specific failures into prompt improvements or examples.
  • Use call-grading to create a feedback loop.
  • Coaching shows up in measurable conversion lift.
  • Train a human "coach" role specifically for the AI agent.

The coaching mindset

Stop thinking of AI outbound as "deploy and monitor metrics." Start thinking of it as "deploy and coach."

Your AI has a system prompt. The prompt is the agent's "training." Coaching = updating the prompt and examples based on what you observe.

The weekly coaching cadence

Day 1: sample. Pull 20–30 calls from the last week. Mix of outcomes (book, disqualify, hang-up, voicemail).

Day 2: score. Rate each on your quality rubric.

Day 3: identify patterns. What kinds of calls does AI do well? Where does it struggle?

Day 4: update. Adjust prompts. Add examples of good responses to tricky situations.

Day 5: test. Run eval set against updated prompt. Compare to baseline.

Continuous improvement.

The rubric

Score calls on:

  • Disclosure. Did AI identify as AI, disclose business, state purpose?
  • Greeting quality. Warm? Confident? Natural?
  • Qualification completeness. Captured needed signals?
  • Objection handling. Responded well to pushback?
  • Empathy / tone. Matched caller's energy?
  • Next step clarity. Clear CTA?
  • Compliance. Opt-out handling, time-of-day, etc.
  • Overall naturalness.

Scale: 1–5 per dimension.

Example coaching moment

You hear a call where caller says "not interested" and AI responds: "Totally understand. Before you go — can I ask just one quick question?"

That's a failure. Coach:

System prompt update:

If caller says "not interested" or equivalent:
- Accept immediately.
- Do not probe, ask questions, or attempt to redirect.
- Thank them, confirm they'll be removed from list.
- End call.

Add to examples:

Caller: "I'm not interested."
Agent: "Totally understand. Sorry for the interruption — 
I'll take you off our list. Have a good day."

Eval with this update; verify improvement.

LLM evaluation

Use eval runs to compare prompt versions:

  • Feed sample scenarios.
  • Compare responses from old vs new prompt.
  • Score each.
  • Ship the winner.

See how to A/B test voice agent prompts and LLM evaluation for conversational agents.

The "coach" role

Assign one person to coach the AI weekly. Responsibilities:

  • Listen to calls.
  • Score against rubric.
  • Identify improvement areas.
  • Propose prompt updates.
  • Test changes.
  • Document evolution.

This is often a Sales Ops or Sales Enablement role. Some teams have dedicated AI trainers.

Feedback from AEs

AEs who pick up after AI-qualified calls have insights:

  • "This lead was over-qualified — AI wasted their time."
  • "AI missed the fact they wanted X."
  • "Great handoff — exactly what I needed."

Capture this feedback. Rotate into coaching.

Coaching by outcome tier

Different outcomes need different coaching:

  • Book but no-show: qualification too lax?
  • Book and convert: reinforce what worked.
  • Disqualify (correctly): ensure clean exit tone.
  • Disqualify (wrongly): too strict.
  • Hang-up: opener quality.
  • Complaint: immediate review.

Positive reinforcement

Don't just fix failures. Reinforce wins:

  • When AI handles a tough objection well, add to examples.
  • When AI nails tone with an upset caller, document the pattern.
  • Success patterns propagate.

Real example: opener iteration

Week 1:

  • Opener: "Hi, this is Acme's AI assistant calling about your inquiry."
  • Answer rate: 12%.

Week 2 (coached):

  • Opener: "Hi Jamie, this is Acme's AI assistant. Quick follow-up on the guide you downloaded last Tuesday — got 90 seconds?"
  • Answer rate: 19%.

Specific, personalized, time-bounded ask. Coaching move: reference prior engagement; give time budget; ask permission.

Building an eval set

Over time, curate:

  • Successful patterns — calls that converted.
  • Failure patterns — calls that went badly.
  • Edge cases — unusual situations.
  • Compliance tests — verify opt-out, time-of-day, etc.

Run the eval set whenever the prompt or model changes.

Avoiding coaching drift

Over time, prompt updates accumulate. Risks:

  • Contradictions emerge.
  • Prompt gets long and unfocused.
  • Specific fixes overfit to specific cases.

Periodic refactoring:

  • Quarterly rewrite to consolidate learnings.
  • Remove outdated guidance.
  • Keep structure clean.

Regression testing

Every prompt change should pass regression:

  • Eval set runs automatically.
  • Compare scores to previous version.
  • Reject changes that regress.

Like code tests but for prompts.

Metrics that matter

  • Conversion rate lift over coaching cycles.
  • Call quality scores (rubric).
  • AE acceptance of AI-qualified leads.
  • Complaint rate trend.
  • Eval set pass rate.

Show improvement over time.

Common pitfalls

Over-coaching specific edge cases. Prompt becomes cluttered with one-off rules.

No eval set. Changes ship blind. Regressions go undetected.

Ignoring positive patterns. Only fix failures. Miss opportunities to reinforce wins.

No documentation. Nobody remembers why the prompt says what it says.

Coaching drift. Prompts change weekly; nobody tracks the trajectory.

FAQ

Can AI coach itself? Partially — LLM can flag anomalies. Human judgment still central.

How often should we coach? Weekly minimum for active deployments. Monthly for stable.

What about coaching on voice / tone specifically? Some vendors support emotional tone guidance in TTS. Coach via prompt + voice model selection.

Can we crowdsource coaching? Multiple reviewers yes. Consistent rubric essential.

What's the ROI of coaching? Typically 15–30% conversion improvement over 3 months with active coaching.

Rohan Pavuluri
Rohan Pavuluri
Building SIMBA Voice Agents

Rohan Pavuluri builds SIMBA Voice Agents at Speechify. Previously, he founded and led Upsolve, the largest nonprofit in the United States serving low-income Americans through technology. He writes about real-world voice-agent deployments — customer support, outbound sales, AI receptionists — and the practical product, design, and operational lessons that actually move the needle.

More from Rohan Pavuluri

View all →

Related reading

Voice AI, twice a month.

Get the best of the SIMBA resources hub — new articles, trend notes, and operator guides. No spam.