๐Ÿง  Conversational AI & LLMs

Designing System Prompts for Multi-Turn Voice Conversations

The system prompt is the single most-iterated artifact in any production voice agent. It's where most of the agent's personality, rules, and reliability live. Most teams underinvest here, treating the prompt as a "set it and forget it" string.

Tyler Weitzman
Tyler Weitzman
January 19, 2026 ยท 6 min read
Speechify

The system prompt is the single most-iterated artifact in any production voice agent. It's where most of the agent's personality, rules, and reliability live. Most teams underinvest here, treating the prompt as a "set it and forget it" string. The teams shipping good voice agents treat prompts as production code, with versioning, testing, and discipline.

TL;DR

  • A good voice agent prompt has six sections: identity, goal, tools, voice style, hard rules, escalation.
  • 800โ€“1500 tokens total. Longer wastes TTFT; shorter usually means missing rules.
  • Iterate via A/B test against an eval set, not by intuition.
  • The biggest mistakes: vague rules, conflicting rules, missing voice style guide.

The six sections

Walking through them in order:

Identity (50โ€“100 tokens)

Who is the agent. Be specific.

You are Maya, the receptionist at Cornerstone Dental Group, a
12-dentist practice in Boston with locations in Cambridge,
Newton, and downtown.

Include enough context for the agent to answer "where are you located?" without a function call.

Goal (1-3 sentences)

The single most important sentence in the prompt. What is this call for?

Your job is to book new-patient appointments, reschedule
existing ones, and answer simple questions about insurance
acceptance and office locations. For anything else, escalate
to a human.

Without this, the agent tries to do everything and does nothing well.

Tools (50โ€“200 tokens, depending on count)

List every function the agent can call, with a clear description per function. The descriptions matter as much as the names โ€” they're what the LLM uses to decide when to call each.

Tools you can call:

get_available_slots(date_range)
  Returns a list of open appointment slots in the given date range.
  Call this when the caller wants to book or reschedule and you
  need to know what's available.

book_appointment(slot_time, caller_name, reason)
  Books the appointment. Returns a confirmation number. Call this
  only after confirming the slot, name, and reason with the caller.

lookup_caller_by_phone(phone_number)
  Returns the caller's existing patient record. Call this near the
  start of the call to recognize repeat callers.

transfer_to_human(reason)
  Transfers the call to a human receptionist. Call this if the
  caller is frustrated, asks for a manager, or asks about anything
  outside scheduling.

Voice style guide (200โ€“400 tokens)

The rules that make the agent sound human. Six to ten rules:

Speaking style:
- Use short sentences. One main clause per sentence.
- Never use bullet points, numbered lists, or formatting.
- When confirming a date or phone number, say each part slowly
  and pause briefly between digits/numbers.
- When you're about to do something that takes more than 1.5
  seconds (function calls), say "let me check on that" or
  "one moment" first.
- Never read more than three options aloud in a row. If you
  have more, summarize ("I have several morning slots open โ€”
  would you like me to list them?").
- Don't start every reply with "Sure" or "Of course" โ€” vary
  your acknowledgments naturally.
- If the caller corrects you, acknowledge briefly ("apologies
  โ€” let me update that") and continue.

Hard rules (100โ€“300 tokens)

Things to never do, things to always do.

Hard rules:
- Never quote prices. If asked, say "for pricing questions
  I'll need to transfer you to our office staff."
- Never give medical advice. If asked, say "I'm not qualified
  to advise on that โ€” you should speak with the dentist."
- Never claim to be human. If asked directly, say "I'm an
  AI assistant for Cornerstone Dental โ€” happy to help with
  scheduling."
- Always confirm appointment details by reading them back
  before booking.

Escalation (50โ€“150 tokens)

When and how to hand off.

Escalation:
Transfer to a human via transfer_to_human if:
- The caller is upset or asks for a manager.
- The caller asks about clinical details (symptoms, treatment
  options).
- The caller asks about insurance disputes or billing.
- You can't understand the caller after two clarification
  attempts.

When transferring, briefly summarize the call to the caller
("Thanks โ€” I'm connecting you to one of our office staff
who can help with that. One moment.") then call the function.

Total length

Sum: roughly 600โ€“1400 tokens for a typical agent. Add 100โ€“300 if you have many functions or a complex use case.

If your prompt is over 2000 tokens, you're probably bloating it. Read through and cut.

Versioning

Treat prompts as code:

  • Store them in your repo (or your platform's version control).
  • Version every change.
  • Document the reason for each change in the commit message.
  • A/B test changes before shipping.

Iteration patterns

A few practical patterns for iterating:

The "what went wrong" log. When you observe a bad call, write down what the agent should have done. Use that as the basis for a new rule.

The "tighten or loosen" check. Each rule should have a clear effect. If you can't articulate the effect, the rule probably doesn't help.

The "delete first" instinct. Before adding a new rule, see if you can rephrase an existing one.

The "examples earn their tokens" rule. Each example in the prompt costs tokens on every turn. Only include examples that demonstrate something the description couldn't.

Common mistakes

Patterns I see repeatedly:

Vague rules. "Be polite." Doesn't help โ€” the agent doesn't know what polite means in your context. Better: "Use the caller's name once they share it. Acknowledge their issue before redirecting."

Conflicting rules. "Be empathetic" + "Don't make commitments." In some scenarios these conflict. Resolve the conflict explicitly.

Untested assumptions. "The agent will figure it out from context." Maybe. Test.

Forgetting voice style. Most prompt failures come from missing voice-specific style guidance. The agent reads aloud "bullet point one" because nothing told it not to.

Missing escalation criteria. If the agent doesn't know when to escalate, it'll either escalate too much or too little. Both bad.

For deeper iteration practices, see how to A/B test voice agent prompts.

FAQ

How long should my prompt be? 800โ€“1500 tokens for most production agents. Aim for the lower end.

Should the prompt include examples? 1โ€“3 well-chosen examples for each major behavior. Don't include 50.

Can I use markdown formatting in the prompt? The model can read it. Whether you should depends on the model โ€” some follow plain prose better.

Should I write the prompt in second person ("you") or third person? Second person is standard. More direct.

How often should I update my prompt? Whenever you observe a recurring pattern that the prompt should handle. Most production agents see 2โ€“10 changes per month after launch.

Tyler Weitzman
Tyler Weitzman
Co-Founder & Head of AI, Speechify

Tyler Weitzman is co-founder and Head of AI at Speechify. He has spent the past decade building the speech-synthesis stack that powers millions of users. Tyler writes about the engineering of real-time conversational systems โ€” text-to-speech, speech recognition, latency budgets, model serving, and the architectural choices that separate prototypes from production-grade voice agents.

More from Tyler Weitzman

View all โ†’

Related reading

Voice AI, twice a month.

Get the best of the SIMBA resources hub โ€” new articles, trend notes, and operator guides. No spam.