A voice conversation is a rich source of signal for lead scoring — far richer than a form submission or a website visit. The caller tells you their role, their company, their need, their timeline, and their tone. The challenge is turning all that into a numeric score the sales team can act on. Done well, AI-powered lead scoring from voice conversations lets you route the top 20% of leads to AEs in real time, tier the rest for nurture, and measure what actually predicts closes over time.

TL;DR

Extract structured signals from the conversation: role, company size, use case, timeline, sentiment.
Map signals to a numeric score (0–100) via a scoring rubric tied to your ICP.
Sentiment and urgency are often the most underused signals.
Calibrate monthly against actual close data.
Route based on score thresholds; iterate on thresholds.

Signals worth capturing

From a voice conversation, you can capture:

Role / title. Self-identified during qualification.
Company size. Explicit or inferred from role/company.
Industry. Stated or inferred.
Use case. The specific problem they're solving.
Timeline. How soon they want a solution.
Budget signal. Explicit or implicit.
Decision authority. Solo or part of a group.
Current solution. What are they using today?
Urgency. How much does this matter?
Sentiment. Neutral, enthusiastic, skeptical.
Engagement. Asking detailed questions vs shallow.

Not all are equal weight. Your scoring model prioritizes what's predictive for your ICP.

A simple scoring rubric

Start simple. Example for a mid-market SaaS:

Signal	Value	Points
Role	VP+	25
Role	Director	15
Role	Manager	5
Company size	500–5000 employees	20
Company size	100–500	10
Use case match	Direct	20
Use case match	Adjacent	10
Timeline	under 3 months	20
Timeline	3–6 months	10
Budget signal	Explicit	10
Budget signal	Implicit	5
Sentiment	Positive	5

Max: 100. Threshold for AE routing: 50.

Implementation

Voice agent captures during the call:

function capture_signals({
  role, title, company_name, company_size, industry,
  use_case_description, timeline, budget_mentioned,
  decision_authority, current_solution, urgency, sentiment
})

Post-call, score function applies rubric:

def calculate_score(signals):
  score = 0
  score += role_score(signals.role)
  score += size_score(signals.company_size)
  score += use_case_score(signals.use_case_description)
  score += timeline_score(signals.timeline)
  score += budget_score(signals.budget_mentioned)
  score += sentiment_score(signals.sentiment)
  return min(100, score)

LLM-assisted extraction

For free-form signals (use case, urgency, sentiment), the LLM extracts and classifies:

Prompt: "Given this call transcript, classify:
- Use case match (direct/adjacent/off-fit): [...]
- Urgency (high/medium/low): [...]
- Sentiment (positive/neutral/negative): [...]"

Post-call extraction is cleaner than trying to classify in-conversation.

Beyond the rubric

Static rubrics are a starting point. More sophisticated:

Regression model trained on historical call data + closed outcomes.
Feature engineering — call duration, number of questions asked, specific keyword density.
Time-decay — older signals weighted less if the lead is re-engaging.

Start static, evolve into ML when you have enough data.

Calibration

Every month, cross-reference AI scores with outcomes:

What did scores predict?
What % of high-score leads became opportunities?
What % closed?
False positives: high-score, didn't close → what was the miss?
False negatives: low-score, closed anyway → what was missed?

Adjust rubric weights based on findings.

Routing thresholds

Score 80–100: priority-route to top AEs; offer meeting immediately.
Score 50–79: route to appropriate AE; 24-hour follow-up.
Score 20–49: nurture; email sequence + SDR follow-up.
Score under 20: disqualify politely; exit.

Tune thresholds over time.

Sentiment as signal

Voice carries sentiment — LLMs can pick up enthusiasm, frustration, skepticism. Don't underuse this:

Enthusiastic about your product → strong buying signal.
Skeptical / testing → needs nurture, not AE time.
Frustrated with current solution → urgent switcher.
Flat / tire-kicker tone → low priority.

Multi-call scoring

For leads who call multiple times:

Aggregate signals across calls.
Weight recency (most recent signals heavier).
Detect pattern: re-engaging strong signal.

Cross-channel scoring

Voice score is one input. Combine with:

Web behavior (pages viewed, time on site).
Email engagement.
Product signals (if self-serve exists).

Unified lead score > voice-only score for most modern funnels.

See inbound lead qualification with voice agents.

Common pitfalls

Over-indexing on declared budget. Callers under-report budget. Infer from other signals.

Under-valuing timeline. Someone saying "this quarter" is hugely different from "next year."

Static rubric. Never recalibrating. Scores drift from reality.

Too many signals. Analysis paralysis. Pick 5–7 that matter.

Ignoring sentiment. Voice gives you sentiment for free; use it.

Privacy consideration

Scoring a lead based on voice content is normal CRM practice. But:

Store scores, not raw transcripts unnecessarily.
Document scoring model for compliance (GDPR right to explanation).
Don't score on protected characteristics (gender, ethnicity, etc.).

Observability

Score distribution (histogram).
% of calls scored correctly per recent calibration.
Conversion rate by score band.
AE acceptance rate by score band.

FAQ

What if the caller is evasive? Partial data → partial score. Don't over-weight missing dimensions.

Can AI predict close probability directly? With enough history, yes — that's where ML beats rubrics. Many teams use hybrid rule + model approach.

Should scores be visible to AEs? Yes — transparent scores build trust. "Why was this scored 75?" is a reasonable question.

How fine-grained should scores be? 0–100 with tiers is usually enough. More granular doesn't add actionability.

What about intent data from third parties? Useful input, separate from voice scoring. Combine at the CRM level.

How to Score Leads From a Voice Conversation

TL;DR

Signals worth capturing

A simple scoring rubric

Implementation

LLM-assisted extraction

Beyond the rubric

Calibration

Routing thresholds

Sentiment as signal

Multi-call scoring

Cross-channel scoring

Common pitfalls

Privacy consideration

Observability

FAQ

More from Tyler Weitzman

Open-Source vs Proprietary Voice Agent Stacks

Build vs Buy: When to Build Your Own Voice Agent

Voice Agents for Developer Support

Related reading

When AI Should Book Meetings vs Hand Off to Humans

Multilingual Lead Qualification: A Practical Guide

Inbound Voice for Trade Shows and Events

Voice AI, twice a month.