๐Ÿ”Œ Integrations & Telephony

Twilio + Voice Agents: A Complete Guide

Twilio is the dominant telephony backbone under most voice agent deployments. If you're building on Vapi, Retell, Simba, OpenAI Realtime, or SIMBA, odds are your calls flow through Twilio at some point.

Tyler Weitzman
Tyler Weitzman
March 22, 2026 ยท 6 min read
Speechify

Twilio is the dominant telephony backbone under most voice agent deployments. If you're building on Vapi, Retell, Simba, OpenAI Realtime, or SIMBA, odds are your calls flow through Twilio at some point. Understanding how Twilio fits into the stack, what the configuration options are, and where the gotchas live is foundational knowledge for anyone operating a voice AI system in production. This isn't deep Twilio-expert material โ€” it's the working operator's guide to what you actually need to know.

This piece covers Twilio integration patterns for voice agents, common configurations, billing mechanics, and the pitfalls that bite teams in their first three months.

TL;DR

  • Twilio provides the phone number, the SIP/telephony layer, and the programmable voice API that most voice AI stacks depend on.
  • BYO Twilio (your account, you bring it) is cheaper and more flexible than vendor-managed Twilio.
  • Key concepts: phone numbers, TwiML, SIP domains, Twilio Voice Insights, A2P 10DLC for SMS.
  • Pricing: per-minute for inbound and outbound, separate for US vs international.
  • Common pitfall: compliance (TCPA, A2P 10DLC) and toll-free verification.

What Twilio does in a voice agent stack

Twilio sits at the telephony edge:

  • Phone number provisioning. US, international, toll-free, short codes.
  • Call routing. Inbound and outbound, SIP or PSTN.
  • Real-time media streaming. Audio frames to/from your voice AI system via WebSocket or SIP.
  • Programmable call control. Start, transfer, record, conference.
  • Call insights. Quality metrics, failure modes.
  • Messaging. SMS for follow-ups, 2FA, notifications.

Your voice AI (Vapi, Retell, SIMBA, etc.) handles the STT/LLM/TTS and orchestration. Twilio handles getting audio between the PSTN and your AI.

BYO Twilio vs vendor-managed

Most voice AI vendors offer two models:

Vendor-managed. Vendor provisions numbers, owns the Twilio account, rolls telephony into a per-minute rate. Simpler, usually 20โ€“40% more expensive, less control.

Bring Your Own Twilio. You own the Twilio account. Vendor connects. Lower cost, more configuration flexibility, more operational work.

For production at scale, BYO almost always wins. For small deployments, vendor-managed is fine for simplicity.

See bring your own Twilio: pros, cons, and setup.

Phone number types

  • Local numbers. Geographic area code. Most common. Cheap.
  • Toll-free. 800/888/877/866/855/844/833. Required verification for outbound (see below).
  • Short codes. 5โ€“6 digit numbers for messaging. Expensive, slow approval.
  • Mobile vs landline provisioning. Matters for SMS.
  • Porting. Moving an existing number to Twilio. 5โ€“15 business days typically.

Plan your number strategy. For customer-facing brands, toll-free signals legitimacy; for local services, local area code is often preferred.

Inbound call routing

An inbound call to your Twilio number:

  1. Twilio receives the call.
  2. Twilio queries your configured webhook or SIP endpoint.
  3. Your voice AI system answers, starts streaming audio.
  4. Conversation happens.
  5. Call ends; Twilio logs the call.

Configuration is typically via TwiML (XML control), Twilio Studio (visual flow builder), or direct SIP integration depending on your vendor.

Outbound call placement

Your voice AI initiates:

  1. Voice AI calls Twilio API: "place a call to +1555..."
  2. Twilio dials the number.
  3. On connect, Twilio establishes media stream back to your AI.
  4. Conversation happens.

Outbound has more compliance considerations โ€” TCPA, A2P 10DLC, caller ID spoofing rules.

See TCPA compliance for AI-powered outbound calls.

TCPA and outbound

US outbound AI calls require TCPA-compliant consent. Twilio provides tools but you own compliance:

  • Prior express consent (PEC) for non-telemarketing.
  • Prior express written consent (PEWC) for telemarketing.
  • Quiet hours enforcement: 8 AM to 9 PM recipient's local time.
  • Do-Not-Call list compliance.
  • AI-generated voice rules โ€” stricter than human-dialed calls.

Twilio's platform helps with DNC compliance but consent is your responsibility.

A2P 10DLC (for SMS)

If your deployment uses SMS follow-ups, A2P 10DLC applies:

  • Brand registration โ€” your organization registers as a sender.
  • Campaign registration โ€” specific use cases (notifications, marketing, 2FA, etc.).
  • Throughput tiers โ€” based on verification.
  • Content filtering โ€” carriers block certain content categories.

Setup takes weeks. Plan ahead.

See A2P 10DLC explained for voice agent builders.

Toll-free verification

Outbound from toll-free numbers requires verification:

  • Submit verification request through Twilio.
  • Include opt-in language, sample messages, brand details.
  • Approval: 1โ€“4 weeks typically.
  • Denials are common on first submission โ€” iterate.

Without verification, your toll-free outbound traffic gets filtered or blocked.

See setting up toll-free verification for AI calling.

Pricing

Rough 2026 Twilio pricing:

  • Phone numbers. ~$1โ€“5/month per number.
  • US inbound voice. ~$0.0085/minute.
  • US outbound voice. ~$0.013/minute.
  • International varies widely. $0.02โ€“$0.50/minute depending on country.
  • SMS. ~$0.008/message outbound, similar inbound.
  • MMS. ~$0.02 outbound.
  • Toll-free voice. ~$0.022/minute inbound, similar outbound.

At scale, negotiate. Committed-usage discounts of 20โ€“40% are common at meaningful volume.

SIP integration

For enterprise deployments, SIP integration bypasses Twilio webhooks:

  • Your voice AI is a SIP endpoint.
  • Twilio SIP Trunking routes calls directly.
  • Lower latency, more control.
  • More complex to operate.

See SIP trunking 101 for voice agent builders.

Observability

Twilio provides:

  • Call logs with status, duration, cost.
  • Voice Insights โ€” quality metrics, latency, jitter, PESQ scores.
  • Debugger โ€” real-time call debugging.
  • Alerts โ€” webhook-based incident signals.

Integrate these with your own observability. Don't fly blind.

Failover and reliability

Twilio has strong uptime but isn't infallible. Plan for failures:

  • Primary/backup regions. Twilio operates multiple regions.
  • Failover to alternative SIP provider โ€” some operators maintain backup with Vonage, Bandwidth, etc.
  • Graceful degradation โ€” if voice stack is slow, fall back to voicemail or queued callback.

Common pitfalls

Getting caller-ID marked as spam. Carriers flag certain calling patterns. High-volume outbound from a single number without proper verification gets flagged. Use branded caller ID (SHAKEN/STIR), rotate numbers thoughtfully.

See caller ID and trust: why numbers get marked as spam.

A2P 10DLC compliance gaps. SMS works in dev, fails at scale. Register and pay for tiers you need.

Toll-free verification rejections. First submission often rejected. Iterate.

International expansion surprises. Every country has local regulations (DIDs, registration, local presence). Plan per-country.

Billing surprises. Per-minute costs add up fast at scale. Monitor and alert on usage anomalies.

Sample call flow (inbound)

# Caller dials your Twilio number
# Twilio โ†’ webhook to your voice AI
# Your AI responds with TwiML:

<Response>
  <Connect>
    <Stream url="wss://your-ai.example.com/voice-stream" />
  </Connect>
</Response>

# Twilio opens a WebSocket to your AI
# Audio frames flow both directions
# AI handles the conversation
# Either side ends the call

Integration architecture

Typical modern stack:

  • Twilio: telephony edge.
  • Vendor voice AI (Vapi/Retell/SIMBA/etc.): STT, LLM, TTS, orchestration.
  • Your business logic: CRM integration, function calls, analytics.
  • Your observability: call logs, transcripts, metrics.

Each layer has its own responsibilities and failure modes.

FAQ

Do I need to use Twilio? No. Alternatives: Vonage, Bandwidth, Telnyx, Plivo. Twilio is most common but not only.

Can I use my existing PBX / telephony? Usually via SIP. Requires SIP trunk configuration.

What about international? Twilio supports global. Each country has its own regulatory and pricing considerations.

How do we handle call recording with Twilio? Enable at call-level or via TwiML. Ensure consent per two-party-consent state rules.

What about Twilio Programmable Voice vs Flex? Programmable Voice is the raw API. Flex is a full contact-center platform. Voice AI usually integrates with Programmable Voice.

Tyler Weitzman
Tyler Weitzman
Co-Founder & Head of AI, Speechify

Tyler Weitzman is co-founder and Head of AI at Speechify. He has spent the past decade building the speech-synthesis stack that powers millions of users. Tyler writes about the engineering of real-time conversational systems โ€” text-to-speech, speech recognition, latency budgets, model serving, and the architectural choices that separate prototypes from production-grade voice agents.

More from Tyler Weitzman

View all โ†’

Related reading

Voice AI, twice a month.

Get the best of the SIMBA resources hub โ€” new articles, trend notes, and operator guides. No spam.