Conversational AI & LLMs
How large language models power voice agents — prompting, function calling, memory, evaluations, and orchestration.
22 articles
How to Handle Personally Identifiable Information in Voice Agents
Voice agents collect PII constantly — names, phone numbers, addresses, dates of birth, account numbers, sometimes even social security numbers and credit cards. Handling this responsibly isn't optional.
Designing Voice Agents That Ask Better Questions
A voice agent that asks bad questions wastes the caller's time and produces bad data. Good questions feel natural and capture what you need in fewer turns.
Open-Source vs Closed-Source LLMs for Voice Agents
The open-source LLM ecosystem caught up to closed models faster than anyone expected. Llama 3.3, Mistral, Qwen — all good enough for most voice agent use cases.
How LLMs Decide What to Say Next in a Voice Conversation
Step inside the LLM's "head" for a moment and look at how it picks what to say on each turn of a voice call. The answer is less mysterious than the term "AI" suggests and more interesting than "next-token prediction" implies.
Red-Teaming Your Voice Agent
Red-teaming is the practice of deliberately trying to break your voice agent before adversaries (or just confused customers) do it for you. Most teams skip it. The ones that do it find embarrassing failures fast — and fix them before they cost real money.
Building a Conversation Memory Layer for Voice Agents
The model has no memory beyond what you put in its context window. For a 5-minute support call this is fine. For longer calls, multi-call interactions, or agents that need to remember preferences across sessions, you need an explicit memory layer.
Why Context Windows Matter Less Than You Think for Voice
LLM marketing has been all about context window expansion — 128K, 200K, 1M, 2M tokens. For voice agents, this race mostly doesn't matter. Voice conversations rarely exceed 5,000 tokens of meaningful context.
How to A/B Test Voice Agent Prompts
Most teams don't A/B test voice agent prompts. They tweak the prompt, listen to a few calls, and ship if it "feels better." This works until it doesn't — until a tweak that helps one use case silently breaks another.
Streaming LLM Outputs to Voice: The Engineering
Streaming the LLM's output to TTS as it generates is the difference between a snappy voice agent and a sluggish one. The basic idea is simple: don't wait for the model to finish thinking before you start speaking.
The Role of Embeddings in Voice Agent Knowledge
Embeddings are the numerical representations of text that make retrieval-augmented generation work. Most voice agent builders never have to think about embeddings directly — their platform handles them.
Multi-Agent Architectures for Customer Service
When a single agent gets too complex — too many intents, too many tools, conflicting style requirements — teams reach for multi-agent architectures. A "router" or "supervisor" routes turns to specialized sub-agents (a billing expert, a tech support expert, a returns expert).
How to Stop a Voice Agent from Hallucinating
Hallucination is the failure mode that scares everyone off voice AI faster than anything else. The agent confidently tells a customer the wrong policy, the wrong price, or makes up a refund.
Designing System Prompts for Multi-Turn Voice Conversations
The system prompt is the single most-iterated artifact in any production voice agent. It's where most of the agent's personality, rules, and reliability live. Most teams underinvest here, treating the prompt as a "set it and forget it" string.
Tool Use vs Function Calling: What's the Difference?
You'll hear "tool use" and "function calling" used interchangeably in voice agent docs. They mean roughly the same thing. The reason both terms exist is mostly historical — different vendors named the same idea differently.
Why Smaller LLMs Often Win for Voice Agents
There's a strong reflex in AI: bigger model = better outcome. For voice agents specifically, this reflex is often wrong. A fast 8B parameter model with sub-200ms time-to-first-token can outperform a 70B frontier model on nearly every voice metric that matters.
Guardrails for Voice Agents: A Pragmatic Take
Guardrails are the rules that prevent your voice agent from doing things it shouldn't — agreeing to refunds it can't authorize, giving medical advice, leaking PII, or making up policies.
Retrieval-Augmented Generation for Voice Agents
RAG — retrieval-augmented generation — is the standard pattern for grounding an LLM in a specific knowledge base. For voice agents, RAG works the same as for chatbots, with one crucial difference: every millisecond of retrieval latency shows up in the conversation.
LLM Evaluation for Conversational Agents
You can't tune what you can't measure. Evaluation is the unsexy work that separates voice agent teams shipping production-quality work from teams flying blind. Most teams underinvest here for the first few months, then have a wake-up moment when something breaks.
How to Give a Voice Agent Long-Term Memory
By default, voice agents have no memory beyond the current call. The caller hangs up, the agent forgets everything. For many use cases this is fine. For loyalty-driven businesses where the same caller comes back repeatedly, it's a missed opportunity.
Prompt Engineering for Voice (vs Text) Agents
If you've written prompts for chatbots, you have a head start on voice agents — but only halfway. The fundamentals of clear instructions and tool definitions carry over. The style guide, the latency considerations, and the failure-mode handling are very different.
Function Calling for Voice Agents: A Practical Guide
Function calling is the feature that turns a voice agent from a chatbot with audio into an actual worker. Without it, the agent can talk about looking up your account; with it, the agent can actually do it.
How Large Language Models Power Voice Agents
When people ask "what's inside a voice agent?" they usually want to hear about the LLM. That's fair — the LLM is the most visible new piece of the stack.