How to Score Leads From a Voice Conversation
A voice conversation is a rich source of signal for lead scoring — far richer than a form submission or a website visit. The caller tells you their role, their company, their need, their timeline, and their tone.
A voice conversation is a rich source of signal for lead scoring — far richer than a form submission or a website visit. The caller tells you their role, their company, their need, their timeline, and their tone. The challenge is turning all that into a numeric score the sales team can act on. Done well, AI-powered lead scoring from voice conversations lets you route the top 20% of leads to AEs in real time, tier the rest for nurture, and measure what actually predicts closes over time.
TL;DR
- Extract structured signals from the conversation: role, company size, use case, timeline, sentiment.
- Map signals to a numeric score (0–100) via a scoring rubric tied to your ICP.
- Sentiment and urgency are often the most underused signals.
- Calibrate monthly against actual close data.
- Route based on score thresholds; iterate on thresholds.
Signals worth capturing
From a voice conversation, you can capture:
- Role / title. Self-identified during qualification.
- Company size. Explicit or inferred from role/company.
- Industry. Stated or inferred.
- Use case. The specific problem they're solving.
- Timeline. How soon they want a solution.
- Budget signal. Explicit or implicit.
- Decision authority. Solo or part of a group.
- Current solution. What are they using today?
- Urgency. How much does this matter?
- Sentiment. Neutral, enthusiastic, skeptical.
- Engagement. Asking detailed questions vs shallow.
Not all are equal weight. Your scoring model prioritizes what's predictive for your ICP.
A simple scoring rubric
Start simple. Example for a mid-market SaaS:
| Signal | Value | Points |
|---|---|---|
| Role | VP+ | 25 |
| Role | Director | 15 |
| Role | Manager | 5 |
| Company size | 500–5000 employees | 20 |
| Company size | 100–500 | 10 |
| Use case match | Direct | 20 |
| Use case match | Adjacent | 10 |
| Timeline | under 3 months | 20 |
| Timeline | 3–6 months | 10 |
| Budget signal | Explicit | 10 |
| Budget signal | Implicit | 5 |
| Sentiment | Positive | 5 |
Max: 100. Threshold for AE routing: 50.
Implementation
Voice agent captures during the call:
function capture_signals({
role, title, company_name, company_size, industry,
use_case_description, timeline, budget_mentioned,
decision_authority, current_solution, urgency, sentiment
})
Post-call, score function applies rubric:
def calculate_score(signals):
score = 0
score += role_score(signals.role)
score += size_score(signals.company_size)
score += use_case_score(signals.use_case_description)
score += timeline_score(signals.timeline)
score += budget_score(signals.budget_mentioned)
score += sentiment_score(signals.sentiment)
return min(100, score)
LLM-assisted extraction
For free-form signals (use case, urgency, sentiment), the LLM extracts and classifies:
Prompt: "Given this call transcript, classify:
- Use case match (direct/adjacent/off-fit): [...]
- Urgency (high/medium/low): [...]
- Sentiment (positive/neutral/negative): [...]"
Post-call extraction is cleaner than trying to classify in-conversation.
Beyond the rubric
Static rubrics are a starting point. More sophisticated:
- Regression model trained on historical call data + closed outcomes.
- Feature engineering — call duration, number of questions asked, specific keyword density.
- Time-decay — older signals weighted less if the lead is re-engaging.
Start static, evolve into ML when you have enough data.
Calibration
Every month, cross-reference AI scores with outcomes:
- What did scores predict?
- What % of high-score leads became opportunities?
- What % closed?
- False positives: high-score, didn't close → what was the miss?
- False negatives: low-score, closed anyway → what was missed?
Adjust rubric weights based on findings.
Routing thresholds
- Score 80–100: priority-route to top AEs; offer meeting immediately.
- Score 50–79: route to appropriate AE; 24-hour follow-up.
- Score 20–49: nurture; email sequence + SDR follow-up.
- Score under 20: disqualify politely; exit.
Tune thresholds over time.
Sentiment as signal
Voice carries sentiment — LLMs can pick up enthusiasm, frustration, skepticism. Don't underuse this:
- Enthusiastic about your product → strong buying signal.
- Skeptical / testing → needs nurture, not AE time.
- Frustrated with current solution → urgent switcher.
- Flat / tire-kicker tone → low priority.
Multi-call scoring
For leads who call multiple times:
- Aggregate signals across calls.
- Weight recency (most recent signals heavier).
- Detect pattern: re-engaging strong signal.
Cross-channel scoring
Voice score is one input. Combine with:
- Web behavior (pages viewed, time on site).
- Email engagement.
- Product signals (if self-serve exists).
Unified lead score > voice-only score for most modern funnels.
See inbound lead qualification with voice agents.
Common pitfalls
Over-indexing on declared budget. Callers under-report budget. Infer from other signals.
Under-valuing timeline. Someone saying "this quarter" is hugely different from "next year."
Static rubric. Never recalibrating. Scores drift from reality.
Too many signals. Analysis paralysis. Pick 5–7 that matter.
Ignoring sentiment. Voice gives you sentiment for free; use it.
Privacy consideration
Scoring a lead based on voice content is normal CRM practice. But:
- Store scores, not raw transcripts unnecessarily.
- Document scoring model for compliance (GDPR right to explanation).
- Don't score on protected characteristics (gender, ethnicity, etc.).
Observability
- Score distribution (histogram).
- % of calls scored correctly per recent calibration.
- Conversion rate by score band.
- AE acceptance rate by score band.
Related reading
- CSAT for AI Agents: Benchmarks and Frameworks
- What Is AI Deflection (and How to Measure It)
- When AI Should Book Meetings vs Hand Off to Humans
- Multilingual Lead Qualification: A Practical Guide
- Inbound Voice for Trade Shows and Events
FAQ
What if the caller is evasive? Partial data → partial score. Don't over-weight missing dimensions.
Can AI predict close probability directly? With enough history, yes — that's where ML beats rubrics. Many teams use hybrid rule + model approach.
Should scores be visible to AEs? Yes — transparent scores build trust. "Why was this scored 75?" is a reasonable question.
How fine-grained should scores be? 0–100 with tiers is usually enough. More granular doesn't add actionability.
What about intent data from third parties? Useful input, separate from voice scoring. Combine at the CRM level.

Tyler Weitzman is co-founder and Head of AI at Speechify. He has spent the past decade building the speech-synthesis stack that powers millions of users. Tyler writes about the engineering of real-time conversational systems — text-to-speech, speech recognition, latency budgets, model serving, and the architectural choices that separate prototypes from production-grade voice agents.
More from Tyler Weitzman
View all →Open-Source vs Proprietary Voice Agent Stacks
The open-source voice AI stack in 2026 is genuinely good. Whisper and its derivatives handle STT. Open-weight LLMs like Llama 3/4, Qwen, Mistral handle the reasoning. Open-source TTS (XTTS, StyleTTS, Orpheus-class) handles output.
Build vs Buy: When to Build Your Own Voice Agent
Build-vs-buy for voice agents in 2026 is a different conversation than it was two years ago. Then, the open-source stack was rough and most serious deployments ended up building.
Voice Agents for Developer Support
Developer support is a strange category. Developers don't generally want to call anyone. They want Stack Overflow, they want clear docs, they want an LLM that can read their code.
Related reading
When AI Should Book Meetings vs Hand Off to Humans
Every inbound call that qualifies runs into the same decision: should the AI book a meeting and end the call, or should it warm-transfer to an AE right now? The answer depends on caller intent, AE availability, deal size, and the overall strategy.
Multilingual Lead Qualification: A Practical Guide
If your business serves any US market, a meaningful share of your inbound leads speak Spanish. In some markets, it's a majority. Similar stories play out globally. Human multilingual qualification capacity is capped by hiring — bilingual SDRs are scarce and expensive.
Inbound Voice for Trade Shows and Events
Trade shows and events generate call volumes most companies aren't structured to handle well. A booth brings 300 leads in three days. A webinar brings 500 registrations in an hour. A podcast sponsorship delivers spikes when the episode drops.
Voice AI, twice a month.
Get the best of the SIMBA resources hub — new articles, trend notes, and operator guides. No spam.
