Function Calling for Voice Agents: A Practical Guide
Function calling is the feature that turns a voice agent from a chatbot with audio into an actual worker. Without it, the agent can talk about looking up your account; with it, the agent can actually do it.
Function calling is the feature that turns a voice agent from a chatbot with audio into an actual worker. Without it, the agent can talk about looking up your account; with it, the agent can actually do it. The basic idea is simple but the implementation has quirks worth understanding before you ship.
TL;DR
- Function calling lets the LLM emit structured requests to call your code (lookup CRM, book appointment, transfer call).
- Three things matter most: clear function names, clear descriptions, and tight parameter schemas.
- For voice, latency matters โ long-running functions need a "let me check" bridge.
- Cap timeouts. Always cap timeouts. A function call that hangs for 5 seconds breaks the conversation.
How it works
Modern LLMs accept a list of available functions alongside the prompt. The model can choose to either reply with text or to emit a structured function call:
{
"name": "lookup_caller",
"arguments": { "phone_number": "+14155550199" }
}
Your orchestration layer intercepts that call, runs the actual code (a database query, an API call), and returns the result to the model. The model continues with that new context.
For voice, the typical flow:
- Caller asks something that requires data lookup.
- LLM emits a function call.
- Your code executes the function.
- Result flows back to the LLM.
- LLM generates a reply with the result baked in.
- TTS speaks the reply.
Steps 2โ5 happen between the caller's turn and the agent's reply. Latency budget is tight.
Designing function names and descriptions
The names and descriptions are what the LLM uses to decide when to call each function. Take them seriously.
Bad:
{ name: "lookup", description: "look something up" }
Good:
{
name: "lookup_caller_by_phone",
description: "Look up a customer record using their phone number. Returns name, account status, and recent order history. Call this whenever the agent needs to identify the caller or fetch their account data."
}
The good version tells the model when to call the function, what it returns, and why you'd want to.
Rules of thumb:
- Function names: verb_noun_modifier.
lookup_account_by_email, notgetAccount. - Descriptions: 2โ4 sentences. Include when to call AND when not to call.
- If you have many similar functions, explicitly differentiate them in the descriptions.
Designing parameter schemas
Use proper JSON schema with types and descriptions:
{
"type": "object",
"properties": {
"phone_number": {
"type": "string",
"description": "E.164 format phone number, e.g. +14155550199"
},
"include_history": {
"type": "boolean",
"description": "Whether to include the caller's last 10 orders"
}
},
"required": ["phone_number"]
}
Make required fields explicit. Use enums where applicable ("status: 'pending' | 'completed' | 'cancelled'"). Be explicit about formats (E.164, ISO date, etc.).
The latency problem
Function calls take time. A typical breakdown:
- Network round-trip to your API: 50โ200ms
- Database query or third-party API: 100โ800ms
- Network round-trip back: 50โ200ms
Total: 200msโ1.2 seconds. That's added to the LLM's response latency, which is added to the caller's perceived wait.
Two mitigations:
1. Cap timeouts. Every function should have a hard timeout (typically 1.5โ3 seconds). If it doesn't return, the agent says "I'm having trouble looking that up โ let me try again" or escalates.
2. Bridge with chitchat. When the LLM calls a function it knows might be slow, your prompt should tell it to say something first: "Let me check on that."
The bridge pattern is implemented in the prompt:
When you call a function that may take more than 1.5 seconds
(like get_appointment_history or sync_external_system),
first say something to the caller like "let me look that up"
or "one moment" before making the call.
When to make a function call vs answer from memory
Common bug: the LLM "remembers" something and answers from that instead of looking it up. For static info (your hours, your return policy), this is fine. For dynamic info (current order status, today's availability), this is dangerous.
The fix is in the prompt. Be explicit:
Always call get_order_status before answering questions about
the caller's order. Do not rely on prior conversation context
for order status โ orders change in real time.
This is the most underused move in production prompts.
Function-call reliability
In practice, three things go wrong:
1. The model picks the wrong function. Mitigation: clearer descriptions; fewer overlapping functions; explicit examples in the prompt.
2. The model fills the wrong arguments. Mitigation: tighter schemas; explicit format examples; require fields the model can't easily fudge.
3. The model calls a function it shouldn't. Mitigation: explicit "do not call X if Y" rules; guardrails that intercept and reject inappropriate calls.
Reliability for major hosted LLMs is 95%+ on well-designed function schemas. Below that, your schemas need work.
Real-world function patterns
A few shapes that recur across most production agents:
Lookup function. get_X_by_Y(...) returns structured data. Usually fast.
Mutation function. book_appointment(slot, caller_id) โ confirmation. Slower; needs idempotency.
External API call. send_sms_followup(phone, message) โ status. May fail; needs retry logic.
Transfer function. transfer_to_human(reason, context) โ handoff. Should always succeed; if it doesn't, that's an emergency.
Search function. search_knowledge(query) โ list of matching docs. Often slow; cache.
Testing function calling
The eval workflow:
- Pick 50 representative call transcripts.
- For each, list the functions the agent should have called.
- Replay through your current prompt; record what the agent actually called.
- Score: did it call the right function? did it fill the right arguments?
Run this before every prompt change. It catches regressions in function reliability that human grading often misses.
Related reading
- Tool Use vs Function Calling: What's the Difference?
- How Large Language Models Power Voice Agents
- Designing Voice Agents That Ask Better Questions
- Open-Source vs Closed-Source LLMs for Voice Agents
- How LLMs Decide What to Say Next in a Voice Conversation
FAQ
What's the difference between function calling and tool use? They're synonyms. "Tool use" is the older term; "function calling" is what most APIs use today.
Should I use one big function or many small ones? Many small ones. The model picks more reliably when each function has a clear single purpose.
Can the LLM call multiple functions in parallel? Some models support this; most production agents don't take advantage of it because serial is easier to reason about.
How do I handle function errors? Return a structured error to the LLM ("status: 'error', message: 'caller not found'"). The model can then decide how to phrase it to the user.
What about function calling cost? Function-calling overhead is small (~10% extra tokens per call). The bigger cost driver is whatever your function actually does.

Tyler Weitzman is co-founder and Head of AI at Speechify. He has spent the past decade building the speech-synthesis stack that powers millions of users. Tyler writes about the engineering of real-time conversational systems โ text-to-speech, speech recognition, latency budgets, model serving, and the architectural choices that separate prototypes from production-grade voice agents.
More from Tyler Weitzman
View all โOpen-Source vs Proprietary Voice Agent Stacks
The open-source voice AI stack in 2026 is genuinely good. Whisper and its derivatives handle STT. Open-weight LLMs like Llama 3/4, Qwen, Mistral handle the reasoning. Open-source TTS (XTTS, StyleTTS, Orpheus-class) handles output.
Build vs Buy: When to Build Your Own Voice Agent
Build-vs-buy for voice agents in 2026 is a different conversation than it was two years ago. Then, the open-source stack was rough and most serious deployments ended up building.
Voice Agents for Developer Support
Developer support is a strange category. Developers don't generally want to call anyone. They want Stack Overflow, they want clear docs, they want an LLM that can read their code.
Related reading
Tool Use vs Function Calling: What's the Difference?
You'll hear "tool use" and "function calling" used interchangeably in voice agent docs. They mean roughly the same thing. The reason both terms exist is mostly historical โ different vendors named the same idea differently.
Designing Voice Agents That Ask Better Questions
A voice agent that asks bad questions wastes the caller's time and produces bad data. Good questions feel natural and capture what you need in fewer turns.
Open-Source vs Closed-Source LLMs for Voice Agents
The open-source LLM ecosystem caught up to closed models faster than anyone expected. Llama 3.3, Mistral, Qwen โ all good enough for most voice agent use cases.
Voice AI, twice a month.
Get the best of the SIMBA resources hub โ new articles, trend notes, and operator guides. No spam.
