If you've written prompts for chatbots, you have a head start on voice agents — but only halfway. The fundamentals of clear instructions and tool definitions carry over. The style guide, the latency considerations, and the failure-mode handling are very different. This is the delta.

TL;DR

Voice prompts are shorter, more terse, and explicitly forbid formatting.
Voice prompts include pacing instructions (sentence length, pauses, when to bridge with chitchat).
Voice prompts have to handle interruptions, restarts, and audio quality issues that don't exist in chat.
The same model gets different prompts for voice vs chat; don't reuse without adjustment.

What stays the same

Both voice and chat agents need:

Clear identity and role
Defined goals
Tool/function definitions
Hard rules (what not to do)
Escalation criteria

These transfer between channels with minimal change.

What's different about voice prompts

Six categories of difference:

1. Forbid visual formatting

Chat agents can use bullets, headers, code blocks. Voice agents can't. Add explicit rules:

Never use bullet points, numbered lists, or markdown formatting.
Speak in conversational sentences only. If you need to convey
multiple items, use natural conversational structure ("first",
"then", "also").

Without this, the agent will sometimes read aloud "bullet point 1, bullet point 2" — embarrassing.

2. Constrain sentence length

Long sentences land badly in voice. The listener loses track.

Use short sentences. One main clause per sentence ideally.
If you need to convey complex info, break it into 2-3 short
sentences with brief pauses, not one long sentence with
multiple clauses.

3. Specify pacing for slow operations

Voice has a real-time clock that chat doesn't. When the agent's about to do something slow, it should bridge:

When you call a function that may take more than 1.5 seconds,
first say something brief to the caller ("let me check on that"
or "one moment, looking that up"). This keeps the conversation
alive while the function runs.

4. Number and date pronunciation

Tell the agent how to say numbers and dates aloud:

When confirming a phone number, account number, or PIN to the
caller, say each digit individually with brief pauses, like
"that's one, nine, seven, six". When confirming a date, say it
in natural form ("Tuesday the fifteenth at three PM"), not as
a slash-formatted date.

5. Recovery and repair patterns

Voice has more misunderstandings than chat. Pre-write the recovery moves:

If the caller corrects you (says "no" or "actually I meant..."),
acknowledge briefly ("apologies — let me update that") and
update your understanding. Don't argue with the correction.

If the caller says something you don't understand, ask one
clarifying question. If you still can't understand on the
second try, escalate to a human.

6. Handling silence

Chat doesn't have "the user went silent." Voice does. Tell the agent how to handle it:

If the caller hasn't spoken in 5+ seconds, ask if they're still
there ("are you still with me?"). After 15 seconds of no
response, end the call gracefully ("looks like we got
disconnected — feel free to call back").

What gets shorter

Voice prompts are usually 30–50% shorter than equivalent chat prompts. Reasons:

No visual formatting allows for tighter wording.
Each token costs measurable TTFT latency.
Voice prompts are more focused (one bounded use case vs general chat).

A typical voice agent system prompt: 800–1500 tokens. A typical chat agent: 2000–4000.

What gets longer

A few sections that grow specifically for voice:

Voice style guide. Explicit rules about pacing, sentence length, formatting forbiddances. 200–400 tokens.

Function-call latency hints. Telling the agent which functions are slow and to bridge. 100–200 tokens.

Recovery patterns. Pre-written correction handling. 100–300 tokens.

A complete sample structure

For a voice agent:

[Identity — 2 sentences]
You are Maya, the receptionist at Cornerstone Dental Group.

[Goal — 1 sentence]
Your job is to book new-patient appointments and reschedule
existing ones.

[Voice style — 6-10 rules]
- Speak in short sentences. One main clause each.
- No bullets, no formatting, no headers.
- Confirm appointment times by reading them back digit by digit.
- When calling a function that may be slow, say "one moment"
  first.
- (etc.)

[Tools — 3-5 functions, each with name + description]
get_available_slots(date_range)
book_appointment(slot_time, caller_id, reason)
lookup_caller_by_phone(phone)
transfer_to_human(reason)

[Hard rules — 4-8 things never to do]
- Never quote prices.
- Never agree to a refund.
- Never give medical advice.
- (etc.)

[Recovery patterns — 2-4 examples]
If the caller corrects you, acknowledge briefly and update.
If you can't understand after 2 tries, escalate.

[Escalation — when and how]
Transfer to a human if the caller is upset, asks for a manager,
or asks about anything outside scheduling.

[Greeting — the first line]
"Hi, this is Maya from Cornerstone Dental. How can I help?"

Total: ~1000 tokens. Adjust for your use case.

Iteration discipline

The system prompt is the most-iterated artifact in any voice agent. Some tactics:

Version it. Track which prompt was live for which calls.

A/B test. Run two versions in parallel; compare on your eval set. See how to A/B test voice agent prompts.

Don't tune blindly. When you change the prompt, replay 20 historical calls through both versions and compare.

Keep a changelog. "Added rule about reading account numbers digit-by-digit because of recurring complaints."

FAQ

Should I write the same prompt for voice and chat? No. Start with the chat version, but rewrite the style guide and add the voice-specific recovery patterns.

How long should a voice prompt be? 800–1500 tokens for most production agents. Longer is wasted; shorter usually means you're missing rules.

Should the prompt include examples? A few well-chosen examples (1–3) help. Don't include 50 — that's what fine-tuning is for.

How do I know if a rule is working? Replay calls through the prompt and check whether the rule's outcome shows up. If you can't tell, the rule is too vague.

Can the LLM change tone mid-conversation? Yes — your prompt can include conditional rules ("if the caller seems frustrated, slow down and acknowledge their frustration before continuing").

Prompt Engineering for Voice (vs Text) Agents

TL;DR

What stays the same

What's different about voice prompts

1. Forbid visual formatting

2. Constrain sentence length

3. Specify pacing for slow operations

4. Number and date pronunciation

5. Recovery and repair patterns

6. Handling silence

What gets shorter

What gets longer

A complete sample structure

Iteration discipline

FAQ

More from Tyler Weitzman

Open-Source vs Proprietary Voice Agent Stacks

Build vs Buy: When to Build Your Own Voice Agent

Voice Agents for Developer Support

Related reading

Designing Voice Agents That Ask Better Questions

Open-Source vs Closed-Source LLMs for Voice Agents

How LLMs Decide What to Say Next in a Voice Conversation

Voice AI, twice a month.