Tyler Weitzman
Tyler Weitzman
Co-Founder & Head of AI, Speechify

Tyler Weitzman is co-founder and Head of AI at Speechify. He has spent the past decade building the speech-synthesis stack that powers millions of users. Tyler writes about the engineering of real-time conversational systems — text-to-speech, speech recognition, latency budgets, model serving, and the architectural choices that separate prototypes from production-grade voice agents.

Articles by Tyler Weitzman (98)

📊 Guides & Trends

Open-Source vs Proprietary Voice Agent Stacks

The open-source voice AI stack in 2026 is genuinely good. Whisper and its derivatives handle STT. Open-weight LLMs like Llama 3/4, Qwen, Mistral handle the reasoning. Open-source TTS (XTTS, StyleTTS, Orpheus-class) handles output.

Tyler Weitzman · Apr 12, 2026 · 6 min
📊 Guides & Trends

Build vs Buy: When to Build Your Own Voice Agent

Build-vs-buy for voice agents in 2026 is a different conversation than it was two years ago. Then, the open-source stack was rough and most serious deployments ended up building.

Tyler Weitzman · Apr 11, 2026 · 7 min
🏭 Industry

Voice Agents for Developer Support

Developer support is a strange category. Developers don't generally want to call anyone. They want Stack Overflow, they want clear docs, they want an LLM that can read their code.

Tyler Weitzman · Apr 7, 2026 · 6 min
🏭 Industry

Compliance and Accessibility for Government Voice AI

Government voice AI has two compliance layers most commercial deployments don't: a set of federal accessibility standards that are legally binding (Section 508, ADA), and a patchwork of privacy and security rules that vary by agency, level of government, and type of data.

Tyler Weitzman · Apr 6, 2026 · 7 min
🏭 Industry

Voice Agents for Loan Servicing and Collections

Loan servicing and collections is one of the highest-volume, most-regulated phone channels in finance. Every month, hundreds of millions of calls flow between lenders and borrowers about payments due, payments missed, hardship, and resolution.

Tyler Weitzman · Apr 3, 2026 · 7 min
🏭 Industry

Compliance Considerations for AI Voice in Banking

Banking is the most heavily regulated industry where voice AI is seeing meaningful deployment. A misstep on compliance here doesn't just create legal exposure — it triggers regulator attention that can chill your entire program.

Tyler Weitzman · Apr 2, 2026 · 6 min
🏭 Industry

HIPAA Compliance for AI Voice Agents in Healthcare

HIPAA compliance is the first gate for any voice AI deployment in US healthcare. Get it wrong and you're exposed to federal penalties, state attorney-general actions, and class-action litigation.

Tyler Weitzman · Apr 1, 2026 · 7 min
🔌 Integrations

How to Integrate Voice Agents with a Custom REST API

Most voice agent integrations are with off-the-shelf systems — Salesforce, HubSpot, Zendesk, Stripe. But eventually every production deployment needs to integrate with a custom internal API — the billing system, the proprietary order management, the ops dashboard that only your…

Tyler Weitzman · Mar 31, 2026 · 7 min
🔌 Integrations

Sending Voice Agent Transcripts to Slack

Slack is where most teams live in 2026, and for voice agent deployments, getting call transcripts and key events into Slack closes a critical ops loop. Escalations land in the right channel with context. QA reviews happen where the team already works.

Tyler Weitzman · Mar 30, 2026 · 6 min
🔌 Integrations

Connecting Voice Agents to Snowflake or BigQuery

Voice agent deployments generate a lot of data. Every call produces a transcript, metadata (duration, outcome, caller info), function-call traces, sentiment signals, and operational metrics.

Tyler Weitzman · Mar 30, 2026 · 6 min
🔌 Integrations

How to Port a Phone Number to Your Voice Agent

You already have a phone number that customers know. Your main line, published on your website, business cards, Google. You can't afford to change it.

Tyler Weitzman · Mar 29, 2026 · 6 min
🔌 Integrations

Setting Up Toll-Free Verification for AI Calling

Toll-free numbers (800, 888, 877, 866, 855, 844, 833) carry a compliance requirement that catches many voice AI deployments off-guard: before you can reliably send SMS or initiate high-volume outbound voice traffic from a toll-free number, you need carrier verification.

Tyler Weitzman · Mar 28, 2026 · 6 min
🔌 Integrations

SIP vs WebRTC for Voice Agents

SIP and WebRTC are the two dominant technologies for real-time voice in 2026. Most voice agent deployments use one, the other, or both. Deciding which to use for a given integration depends on where the call originates, what network conditions you expect, and how much control…

Tyler Weitzman · Mar 28, 2026 · 5 min
🔌 Integrations

How to Use Twilio Studio with AI Voice Agents

Twilio Studio is Twilio's visual flow builder for call (and SMS) workflows. It lets you drag-and-drop a call flow — gather digits, branch on logic, route to agents, trigger webhooks — without writing code.

Tyler Weitzman · Mar 27, 2026 · 5 min
🔌 Integrations

Bring Your Own Twilio: Pros, Cons, and Setup

Bring Your Own Twilio (BYO) is the architecture where your voice agent platform (Vapi, Retell, Simba, SIMBA) connects to your Twilio account rather than using the vendor's managed Twilio setup.

Tyler Weitzman · Mar 27, 2026 · 5 min
🔌 Integrations

Connecting Voice Agents to Stripe for Payments

Taking payments over the phone is a workflow that voice agents get asked to handle constantly — bill payments, copays, service fees, subscription changes, you name it.

Tyler Weitzman · Mar 26, 2026 · 7 min
🔌 Integrations

Sending SMS Follow-Ups from Voice Agents

SMS follow-ups are one of the highest-ROI additions to any voice agent deployment. The caller just had a conversation; they know the appointment time, the tracking link, the next step. But people forget.

Tyler Weitzman · Mar 26, 2026 · 7 min
🔌 Integrations

Calendar Integrations: Cal.com, Google, Outlook

Voice agents that book, reschedule, or cancel appointments live or die on their calendar integration. A voice agent that guesses at availability or writes to the wrong calendar breaks the workflow it was built for.

Tyler Weitzman · Mar 25, 2026 · 7 min
🔌 Integrations

Webhooks 101 for Voice Agents

Webhooks are the backbone of voice agent integrations. When your voice agent needs to call a CRM, update a ticket, send an SMS, or trigger any external action, it does so via HTTP — and most of those HTTP calls are structured as webhooks or webhook-like REST operations.

Tyler Weitzman · Mar 25, 2026 · 7 min
🔌 Integrations

Connecting Voice Agents to Zendesk

Zendesk is the dominant ticketing and support platform for mid-market and enterprise customer service, and it's where most voice agent-handled support interactions need to land.

Tyler Weitzman · Mar 24, 2026 · 6 min
🔌 Integrations

Connecting Voice Agents to Intercom

Intercom is the messaging-first customer communication platform that a lot of SaaS companies run their support on. Historically chat-centric, it's expanded to cover email, a light voice layer, and AI-native tools (Fin).

Tyler Weitzman · Mar 24, 2026 · 6 min
🔌 Integrations

Connecting Voice Agents to HubSpot CRM

HubSpot is the CRM of choice for a large share of SMB and mid-market SaaS companies, and increasingly for mid-market customers in other verticals. Its API is cleaner than Salesforce's, its data model is simpler, and integrations are generally less painful.

Tyler Weitzman · Mar 23, 2026 · 6 min
🔌 Integrations

Connecting Voice Agents to Salesforce CRM

Salesforce is the de facto CRM for most mid-market and enterprise companies deploying voice AI. If your agent is doing anything meaningful in a business context — handling sales inquiries, supporting customers, qualifying leads, processing orders — there's a good chance the…

Tyler Weitzman · Mar 23, 2026 · 6 min
🔌 Integrations

SIP Trunking 101 for Voice Agent Builders

SIP trunking is the unsexy plumbing that makes voice agents work at scale. It's the protocol and infrastructure that lets calls move between the public phone network and your voice AI without relying on a telephony provider's proprietary APIs.

Tyler Weitzman · Mar 22, 2026 · 7 min
🔌 Integrations

Twilio + Voice Agents: A Complete Guide

Twilio is the dominant telephony backbone under most voice agent deployments. If you're building on Vapi, Retell, Simba, OpenAI Realtime, or SIMBA, odds are your calls flow through Twilio at some point.

Tyler Weitzman · Mar 22, 2026 · 6 min
🔊 Speech Tech

Streaming Audio Over WebRTC for Voice Agents

WebRTC is the browser-native way to stream real-time audio. For voice agents embedded in web or mobile apps, it's often the best transport — lower latency than webhooks, built-in encryption, native NAT traversal, cross-platform.

Tyler Weitzman · Mar 21, 2026 · 5 min
🔊 Speech Tech

How to Benchmark a Voice Agent's End-to-End Latency

Vendor-reported latency is a lab number. What matters for your voice agent is measured latency in your production environment, under real network conditions, with your actual content.

Tyler Weitzman · Mar 21, 2026 · 5 min
🔊 Speech Tech

Comparing Neural TTS Architectures

Neural TTS has evolved rapidly since 2018 — Tacotron gave way to WaveNet-style vocoders, which gave way to VALL-E-style neural codec models, which gave way to flow-matching and diffusion-based systems. Each architecture shift brought real quality improvements.

Tyler Weitzman · Mar 20, 2026 · 5 min
🔊 Speech Tech

Phoneme-Level Tuning for Voice Agents

Most voice agent quality work happens at the text level — prompt engineering, SSML, pronunciation dictionaries. But sometimes the right layer is deeper: phonemes, the individual sound units of spoken language.

Tyler Weitzman · Mar 19, 2026 · 4 min
🔊 Speech Tech

Why Some Voices Sound Robotic Even in 2026

TTS in 2026 should sound natural. Most of the time it does. But occasionally a synthetic voice still gives itself away — a weird pause, a flat delivery, a strange pronunciation. Understanding why it happens, and what to do about it, is part of the voice engineering discipline.

Tyler Weitzman · Mar 19, 2026 · 5 min
🔊 Speech Tech

How Sample Rate Affects Voice Agent Quality

Sample rate is one of those low-level audio details that voice agent builders often inherit without thinking about. The STT config says 16 kHz; the TTS outputs 24 kHz; the PSTN leg is 8 kHz.

Tyler Weitzman · Mar 18, 2026 · 5 min
🔊 Speech Tech

Echo Cancellation in Real-Time Voice AI

Echo in voice agent calls sounds like this: agent starts speaking, caller's speaker plays agent's voice, caller's microphone picks up agent's voice, the audio flows back to the agent, agent's STT transcribes its own speech, agent gets confused, conversation breaks down.

Tyler Weitzman · Mar 17, 2026 · 5 min
🔊 Speech Tech

How Background Noise Affects Voice Agent Accuracy

Production voice agents live in noisy environments. Callers call from cars, offices, restaurants, kitchens with running faucets, grocery stores with loud music, outdoor job sites. Real audio has sirens, barking dogs, other conversations, and TV in the background.

Tyler Weitzman · Mar 17, 2026 · 4 min
🔊 Speech Tech

Audio Codecs for Voice Agents: Opus, PCMU, and More

Audio codecs determine the quality, bandwidth, and latency of every voice agent call. The choice between G.711, Opus, G.722, and others affects how your audio sounds over the line, how much bandwidth you consume, and how well STT and TTS perform.

Tyler Weitzman · Mar 16, 2026 · 5 min
🔊 Speech Tech

Diarization: Knowing Who's Speaking in a Voice Conversation

Speaker diarization is the task of answering "who spoke when?" Given audio with multiple speakers, diarization outputs time-stamped segments labeled by speaker. For most voice agent use cases — one caller, one agent — diarization is trivial (channel-based separation works).

Tyler Weitzman · Mar 16, 2026 · 5 min
🔊 Speech Tech

Voice Activity Detection in Production Voice Agents

Voice Activity Detection — VAD — is the unglamorous infrastructure deciding when the caller has started speaking, when they've paused, and when they're definitively done. It sits upstream of STT, LLM, and TTS, but bad VAD can ruin an otherwise excellent voice agent.

Tyler Weitzman · Mar 15, 2026 · 5 min
🔊 Speech Tech

The Engineering Behind Sub-Second Voice Agents

Sub-second voice agents — end-to-end latency under 1000ms from caller speech end to agent speech start — used to be aspirational. In 2026 it's table stakes for production voice AI, and leading deployments are hitting sub-500ms.

Tyler Weitzman · Mar 15, 2026 · 4 min
🔊 Speech Tech

How STT Handles Disfluencies and Filler Words

Real speech is messy. People say "um," "uh," "like," and "you know" constantly. They start sentences and abandon them. They repeat themselves. They mumble and correct.

Tyler Weitzman · Mar 14, 2026 · 5 min
🔊 Speech Tech

Multilingual TTS: Choosing a Voice Model

Multilingual text-to-speech in 2026 is good but uneven. English is excellent. Spanish, French, German, Mandarin, Japanese are strong. Beyond the top 10 languages, quality drops noticeably.

Tyler Weitzman · Mar 14, 2026 · 4 min
🔊 Speech Tech

Why TTS Quality Plateaus and How to Push Past It

Every voice AI team eventually hits the TTS quality plateau. You pick a good TTS provider, tune some basics, and quality is... fine. Not amazing, not bad. Specific edge cases stay wrong. Certain phrases sound robotic. Numbers get weird. Tone lacks variation.

Tyler Weitzman · Mar 13, 2026 · 5 min
🔊 Speech Tech

How TTS Models Handle Numbers, Dates, and Acronyms

Numbers, dates, and acronyms are the trickiest content for TTS. "Dr. Smith will see you on 3/12/2026 for your $47.50 copay" seems simple until you realize the model has to decide: is "3/12" a date or a fraction? Is "$47.50" dollars or just numbers? Is "Dr." "Doctor" or "Drive"?

Tyler Weitzman · Mar 13, 2026 · 5 min
🔊 Speech Tech

Streaming STT: How to Cut Recognition Latency

Non-streaming speech-to-text works for transcription — you submit audio, wait, get a transcript. That pattern is fine for batch use cases but fatal for voice agents.

Tyler Weitzman · Mar 12, 2026 · 5 min
🔊 Speech Tech

Streaming TTS: How to Cut First-Audio Latency

First-audio latency — the time from when the TTS receives text to when the caller hears the first sound — is one of the biggest levers in voice agent latency optimization.

Tyler Weitzman · Mar 12, 2026 · 5 min
🔊 Speech Tech

Latency Engineering for Real-Time Voice Agents

Latency is what separates voice agents that feel conversational from those that feel broken. Humans expect responses within 700ms of finishing a sentence — anything longer triggers a "did they hear me?" reaction. Sub-500ms feels alive. Sub-300ms feels exceptional.

Tyler Weitzman · Mar 11, 2026 · 5 min
🔊 Speech Tech

Voice Cloning: How It Works and Why It Matters

Voice cloning — the technology to replicate a specific person's voice from a short audio sample — has been one of the most disruptive developments in voice AI. In 2022 it was a research curiosity requiring hours of training data.

Tyler Weitzman · Mar 10, 2026 · 5 min
🔊 Speech Tech

Speech-to-Text Word Error Rate Explained

Word Error Rate — WER — is the dominant quality metric for speech-to-text. Every STT vendor reports WER. Every evaluation report ranks models by WER. Most voice agent engineers know the term but have at best a fuzzy sense of what the number really means in production.

Tyler Weitzman · Mar 9, 2026 · 5 min
🔊 Speech Tech

Text-to-Speech in 2026: The State of the Art

Text-to-speech in 2026 has crossed a threshold most people alive today didn't expect to see. Blind A/B tests consistently show that 70–85% of listeners can't reliably distinguish synthetic voices from real recordings of humans.

Tyler Weitzman · Mar 9, 2026 · 4 min
🎯 Lead Qual

Connecting Voice Lead Qual to HubSpot

HubSpot is the CRM of choice for most SMB and mid-market SaaS, and voice-AI-qualified leads land there more often than in Salesforce for that segment. HubSpot's data model is cleaner than Salesforce's, the API is friendlier, and the integration workload is lower.

Tyler Weitzman · Feb 26, 2026 · 5 min
🎯 Lead Qual

Connecting Voice Lead Qual to Salesforce

Salesforce is where most enterprise sales teams live. For voice-AI-qualified leads to generate real pipeline, they have to land in Salesforce cleanly — right object type, right owner, right stage, right custom fields populated.

Tyler Weitzman · Feb 26, 2026 · 5 min
🎯 Lead Qual

How to Score Leads From a Voice Conversation

A voice conversation is a rich source of signal for lead scoring — far richer than a form submission or a website visit. The caller tells you their role, their company, their need, their timeline, and their tone.

Tyler Weitzman · Feb 22, 2026 · 5 min
📞 Outbound

Outbound Agent Metrics That Actually Matter

Outbound voice AI deployments can produce dashboards dense with metrics. Calls dialed, calls answered, average handle time, average time to first word, sentiment score, coverage rate, disposition breakdown, opt-out rate, compliance incident rate. Many of these are interesting.

Tyler Weitzman · Feb 19, 2026 · 5 min
📞 Outbound

How to Build a Compliant Outbound Voice Agent in 30 Days

Getting an outbound AI voice agent live in 30 days sounds ambitious — but it's achievable for focused deployments. The critical path is compliance setup (TCPA, A2P 10DLC, number verification), not technology. The voice AI itself can be configured in a week.

Tyler Weitzman · Feb 17, 2026 · 5 min
📞 Outbound

Caller ID and Trust: Why Numbers Get Marked as Spam

You deploy an outbound voice AI campaign. First week goes great. Second week, answer rates drop 40%. Third week, your phone numbers start showing up as "Scam Likely" on caller ID. What happened?

Tyler Weitzman · Feb 16, 2026 · 5 min
📞 Outbound

DTMF and IVR Navigation for Outbound Voice Agents

Outbound voice agents calling businesses often encounter IVR systems — "press 1 for sales, press 2 for support" phone trees that the AI needs to navigate to reach the right person.

Tyler Weitzman · Feb 16, 2026 · 5 min
📞 Outbound

A2P 10DLC Explained for Voice Agent Builders

If your voice agent sends SMS from a standard 10-digit US phone number, A2P 10DLC compliance is part of your stack — whether you know it or not.

Tyler Weitzman · Feb 10, 2026 · 5 min
📞 Outbound

TCPA Compliance for AI-Powered Outbound Calls

TCPA — the Telephone Consumer Protection Act — is the federal law that governs automated and pre-recorded outbound calls in the United States. AI-generated voice calls fall squarely under TCPA's stricter rules for "artificial or prerecorded voice" messages.

Tyler Weitzman · Feb 10, 2026 · 6 min
💬 Support

How to Tag and Categorize AI Conversations

Conversation tagging is what turns thousands of AI-handled calls into actionable insight. Every call should get tagged with intent, outcome, sentiment, and any anomalies — automatically, consistently, and in a way that supports both real-time routing and after-the-fact…

Tyler Weitzman · Feb 5, 2026 · 4 min
💬 Support

Quality Assurance for AI Voice Support

Quality assurance for AI voice support is mostly the same as QA for human contact centers — but with different staffing, different tools, and a much higher possible cadence. Done well, AI QA closes the loop between observation and prompt iteration in days instead of months.

Tyler Weitzman · Feb 5, 2026 · 5 min
💬 Support

Why First-Contact Resolution Is the North Star for AI Support

If you can only track one metric for AI customer support, it should be First-Contact Resolution (FCR). Not deflection. Not handle time. Not even CSAT.

Tyler Weitzman · Jan 31, 2026 · 6 min
💬 Support

CSAT for AI Agents: Benchmarks and Frameworks

Customer Satisfaction (CSAT) is the closest thing to a north star for support agents. Tracking it for AI agents specifically — and comparing it against human-handled equivalents — is the single most useful operational habit for any team running customer-facing AI.

Tyler Weitzman · Jan 29, 2026 · 5 min
💬 Support

What Is AI Deflection (and How to Measure It)

"Deflection" is the most-cited and most-misunderstood metric in AI customer support. Vendors quote 80% deflection rates. Buyers don't always know what that means or how to verify it.

Tyler Weitzman · Jan 26, 2026 · 5 min
🧠 LLMs

How to Handle Personally Identifiable Information in Voice Agents

Voice agents collect PII constantly — names, phone numbers, addresses, dates of birth, account numbers, sometimes even social security numbers and credit cards. Handling this responsibly isn't optional.

Tyler Weitzman · Jan 25, 2026 · 5 min
🧠 LLMs

Open-Source vs Closed-Source LLMs for Voice Agents

The open-source LLM ecosystem caught up to closed models faster than anyone expected. Llama 3.3, Mistral, Qwen — all good enough for most voice agent use cases.

Tyler Weitzman · Jan 24, 2026 · 4 min
🧠 LLMs

How LLMs Decide What to Say Next in a Voice Conversation

Step inside the LLM's "head" for a moment and look at how it picks what to say on each turn of a voice call. The answer is less mysterious than the term "AI" suggests and more interesting than "next-token prediction" implies.

Tyler Weitzman · Jan 23, 2026 · 5 min
🧠 LLMs

Red-Teaming Your Voice Agent

Red-teaming is the practice of deliberately trying to break your voice agent before adversaries (or just confused customers) do it for you. Most teams skip it. The ones that do it find embarrassing failures fast — and fix them before they cost real money.

Tyler Weitzman · Jan 23, 2026 · 5 min
🧠 LLMs

Building a Conversation Memory Layer for Voice Agents

The model has no memory beyond what you put in its context window. For a 5-minute support call this is fine. For longer calls, multi-call interactions, or agents that need to remember preferences across sessions, you need an explicit memory layer.

Tyler Weitzman · Jan 22, 2026 · 5 min
🧠 LLMs

Why Context Windows Matter Less Than You Think for Voice

LLM marketing has been all about context window expansion — 128K, 200K, 1M, 2M tokens. For voice agents, this race mostly doesn't matter. Voice conversations rarely exceed 5,000 tokens of meaningful context.

Tyler Weitzman · Jan 22, 2026 · 4 min
🧠 LLMs

How to A/B Test Voice Agent Prompts

Most teams don't A/B test voice agent prompts. They tweak the prompt, listen to a few calls, and ship if it "feels better." This works until it doesn't — until a tweak that helps one use case silently breaks another.

Tyler Weitzman · Jan 21, 2026 · 5 min
🧠 LLMs

Streaming LLM Outputs to Voice: The Engineering

Streaming the LLM's output to TTS as it generates is the difference between a snappy voice agent and a sluggish one. The basic idea is simple: don't wait for the model to finish thinking before you start speaking.

Tyler Weitzman · Jan 21, 2026 · 6 min
🧠 LLMs

The Role of Embeddings in Voice Agent Knowledge

Embeddings are the numerical representations of text that make retrieval-augmented generation work. Most voice agent builders never have to think about embeddings directly — their platform handles them.

Tyler Weitzman · Jan 20, 2026 · 5 min
🧠 LLMs

Multi-Agent Architectures for Customer Service

When a single agent gets too complex — too many intents, too many tools, conflicting style requirements — teams reach for multi-agent architectures. A "router" or "supervisor" routes turns to specialized sub-agents (a billing expert, a tech support expert, a returns expert).

Tyler Weitzman · Jan 20, 2026 · 5 min
🧠 LLMs

How to Stop a Voice Agent from Hallucinating

Hallucination is the failure mode that scares everyone off voice AI faster than anything else. The agent confidently tells a customer the wrong policy, the wrong price, or makes up a refund.

Tyler Weitzman · Jan 19, 2026 · 6 min
🧠 LLMs

Designing System Prompts for Multi-Turn Voice Conversations

The system prompt is the single most-iterated artifact in any production voice agent. It's where most of the agent's personality, rules, and reliability live. Most teams underinvest here, treating the prompt as a "set it and forget it" string.

Tyler Weitzman · Jan 19, 2026 · 6 min
🧠 LLMs

Tool Use vs Function Calling: What's the Difference?

You'll hear "tool use" and "function calling" used interchangeably in voice agent docs. They mean roughly the same thing. The reason both terms exist is mostly historical — different vendors named the same idea differently.

Tyler Weitzman · Jan 18, 2026 · 4 min
🧠 LLMs

Why Smaller LLMs Often Win for Voice Agents

There's a strong reflex in AI: bigger model = better outcome. For voice agents specifically, this reflex is often wrong. A fast 8B parameter model with sub-200ms time-to-first-token can outperform a 70B frontier model on nearly every voice metric that matters.

Tyler Weitzman · Jan 17, 2026 · 5 min
🧠 LLMs

Guardrails for Voice Agents: A Pragmatic Take

Guardrails are the rules that prevent your voice agent from doing things it shouldn't — agreeing to refunds it can't authorize, giving medical advice, leaking PII, or making up policies.

Tyler Weitzman · Jan 17, 2026 · 6 min
🧠 LLMs

Retrieval-Augmented Generation for Voice Agents

RAG — retrieval-augmented generation — is the standard pattern for grounding an LLM in a specific knowledge base. For voice agents, RAG works the same as for chatbots, with one crucial difference: every millisecond of retrieval latency shows up in the conversation.

Tyler Weitzman · Jan 16, 2026 · 5 min
🧠 LLMs

LLM Evaluation for Conversational Agents

You can't tune what you can't measure. Evaluation is the unsexy work that separates voice agent teams shipping production-quality work from teams flying blind. Most teams underinvest here for the first few months, then have a wake-up moment when something breaks.

Tyler Weitzman · Jan 16, 2026 · 6 min
🧠 LLMs

How to Give a Voice Agent Long-Term Memory

By default, voice agents have no memory beyond the current call. The caller hangs up, the agent forgets everything. For many use cases this is fine. For loyalty-driven businesses where the same caller comes back repeatedly, it's a missed opportunity.

Tyler Weitzman · Jan 15, 2026 · 5 min
🧠 LLMs

Prompt Engineering for Voice (vs Text) Agents

If you've written prompts for chatbots, you have a head start on voice agents — but only halfway. The fundamentals of clear instructions and tool definitions carry over. The style guide, the latency considerations, and the failure-mode handling are very different.

Tyler Weitzman · Jan 15, 2026 · 6 min
🧠 LLMs

Function Calling for Voice Agents: A Practical Guide

Function calling is the feature that turns a voice agent from a chatbot with audio into an actual worker. Without it, the agent can talk about looking up your account; with it, the agent can actually do it.

Tyler Weitzman · Jan 14, 2026 · 6 min
🧠 LLMs

How Large Language Models Power Voice Agents

When people ask "what's inside a voice agent?" they usually want to hear about the LLM. That's fair — the LLM is the most visible new piece of the stack.

Tyler Weitzman · Jan 14, 2026 · 10 min
🎙️ Fundamentals

The Hidden Complexity of Numbers in Voice Agents

Numbers are the most underestimated source of pain in voice AI. Phone numbers, account numbers, dates, prices, addresses — all of them have edge cases that turn a clean conversation into a back-and-forth of "no, one nine seven, not nineteen seven." The fix isn't a better LLM;…

Tyler Weitzman · Jan 13, 2026 · 6 min
🎙️ Fundamentals

How Voice Agents Handle Accents and Dialects

Voice AI is great at standard American English. It's pretty good at standard British, Australian, and Indian English. It's variably good at everything else.

Tyler Weitzman · Jan 13, 2026 · 5 min
🎙️ Fundamentals

How to Measure Voice Agent Quality

Most voice agent teams measure the wrong things. They watch deflection rate and call duration; they ignore the quality of what happened inside the call. The result: agents that look good on dashboards and feel bad on the phone.

Tyler Weitzman · Jan 12, 2026 · 5 min
🎙️ Fundamentals

The Difference Between Streaming and Non-Streaming Voice Agents

Streaming is the most underrated word in voice AI. The difference between a streaming and a non-streaming pipeline is the difference between a voice agent that feels alive and one that feels like a slow walkie-talkie.

Tyler Weitzman · Jan 10, 2026 · 5 min
🎙️ Fundamentals

How Voice Agents Recover from Misunderstandings

Real conversations have misunderstandings. The agent mishears a name, asks the wrong clarifying question, or jumps to the wrong intent. How the agent recovers matters more than how often it stumbles. A graceful recovery can leave the caller feeling like the agent is competent.

Tyler Weitzman · Jan 10, 2026 · 5 min
🎙️ Fundamentals

How Voice Agents Decide When to Stop Talking

A voice agent that doesn't know when to shut up is one of the most annoying things in software. Even if every word is right, an agent that talks past the moment when the caller wanted to interject feels worse than no agent at all.

Tyler Weitzman · Jan 9, 2026 · 5 min
🎙️ Fundamentals

Synchronous vs Asynchronous Voice Agents

Most voice agents are synchronous: a real-time phone call where the agent and the caller exchange turns immediately. But there's a quietly growing class of asynchronous voice agents — voice messaging, voicemail-style interactions, scheduled callbacks.

Tyler Weitzman · Jan 8, 2026 · 5 min
🎙️ Fundamentals

What Makes a Voice Agent "Production Ready"

A voice agent that works in a demo is a different product from one that works in production. The demo only has to handle the happy path with a friendly tester.

Tyler Weitzman · Jan 8, 2026 · 5 min
🎙️ Fundamentals

Why Voice Agents Sound More Human Every Year

Five years ago, you could spot a synthetic voice in three seconds. Today the best ones can run a 5-minute conversation without anyone noticing.

Tyler Weitzman · Jan 7, 2026 · 5 min
🎙️ Fundamentals

How Voice Agents Differ from Voice Assistants

Siri, Alexa, and Google Assistant are voice assistants. The system that picks up your dentist's phone and books your cleaning is a voice agent. Both involve talking to a computer, but they're different products with different design constraints.

Tyler Weitzman · Jan 7, 2026 · 5 min
🎙️ Fundamentals

How Voice Agents Handle Interruptions Gracefully

Interruption handling is the single most-felt UX detail in voice AI. Done well, the agent feels conversational and responsive. Done poorly, the agent runs over you, doesn't notice, and you end up shouting at your phone. This is the engineering and design behind getting it right.

Tyler Weitzman · Jan 4, 2026 · 6 min
🎙️ Fundamentals

The Anatomy of a Voice Agent Pipeline

If you took every voice agent in production today and dissected them, you'd find roughly the same skeleton. The names change. The vendors change. The plumbing details vary.

Tyler Weitzman · Jan 4, 2026 · 9 min
🎙️ Fundamentals

Turn-Taking and Barge-In: The Mechanics of Natural Conversation

Two humans on a phone call don't take turns the way a tennis match does. They overlap. They interrupt. They finish each other's sentences. They leave 200ms gaps between turns and call it polite. A voice agent that can't do this — even if every word is correct — feels broken.

Tyler Weitzman · Jan 3, 2026 · 6 min
🎙️ Fundamentals

Latency in Voice AI: Why Sub-500ms Matters

When two humans talk, the gap between one person finishing a sentence and the other starting their reply is tiny — usually around 200ms. Sometimes the next person starts speaking before the first person has actually finished, predicting the end of the sentence.

Tyler Weitzman · Jan 3, 2026 · 9 min
🎙️ Fundamentals

Voice Agents vs Chatbots: When to Use Which

A chatbot is a turn-based text exchange with no real-time pressure. A voice agent is a real-time spoken conversation with a tight latency budget and a much messier input channel.

Tyler Weitzman · Jan 2, 2026 · 5 min
🎙️ Fundamentals

How a Conversational Voice Agent Actually Works (Under the Hood)

If you open the box on a modern voice agent, you'll find roughly four moving parts: a streaming speech recognizer, a language model, a text-to-speech engine, and a turn-taking referee that decides whose turn it is to speak. None of that is exotic on its own.

Tyler Weitzman · Jan 1, 2026 · 9 min