Multilingual Support: When and How to Add a Second Language
Adding a second language to an AI voice agent feels simple on paper — the models support it, the TTS is available, switch a flag. In practice, good multilingual support is a project. Done well, it unlocks new markets. Done poorly, it confuses customers in both languages.
Adding a second language to an AI voice agent feels simple on paper — the models support it, the TTS is available, switch a flag. In practice, good multilingual support is a project. Done well, it unlocks new markets. Done poorly, it confuses customers in both languages. This is the practical playbook.
TL;DR
- Add a second language when you have clear volume in that language, not speculatively.
- Translate the system prompt; don't just feed the English one to a multilingual model.
- Native speakers must review the prompt and test real calls.
- Expect 2-4 weeks of work for a properly done second language.
When to add a second language
Triggers:
Measurable volume. 10%+ of your customers prefer a language you don't support.
Geographic expansion. New market where the primary language isn't English.
Competitive pressure. Competitors offer multilingual; you need parity.
Regulatory requirement. Some jurisdictions require native-language support.
Don't add languages speculatively. "We might need Spanish someday" isn't a reason to build it now.
What "multilingual" actually requires
Five things:
1. STT in the target language. Your speech recognizer needs to transcribe the language accurately.
2. TTS in the target language. Your text-to-speech needs to speak it naturally.
3. LLM that handles the language well. Most frontier LLMs are strong in major languages; check smaller ones.
4. A translated + localized system prompt. Not just translated — localized. Different greeting conventions, register, politeness norms.
5. Native-speaker testing. Real calls from real native speakers before launch.
Miss any of these and your "multilingual" agent feels broken.
Translating the system prompt
The common mistake: translate English → Spanish with ChatGPT; deploy.
The better approach:
- Have a native speaker translate.
- Adapt register for the target language (Spanish has tú vs usted; Japanese has formal/informal distinctions; etc.).
- Adjust cultural references (idioms, currency format, examples).
- Re-test with native callers.
Budget: 4-8 hours of native-speaker work per language.
Picking STT and TTS
For English → Spanish (as an example):
STT:
- Deepgram: strong Spanish support, streaming.
- AssemblyAI: good.
- Whisper (self-hosted): good but slower in streaming mode.
TTS:
- Simba Multilingual v2: many Spanish voices, high quality.
- OpenAI TTS: multilingual built-in.
- Cartesia: growing multilingual support.
Test each on your specific prompts. Quality varies by speaker style, regional accent, and content domain.
Handling accents within a language
Spanish isn't one accent. Mexican, Castilian, Colombian, Argentinian all sound different. Same for:
- English (US, UK, Australian, Indian)
- French (France, Canadian)
- Portuguese (Brazilian, European)
For each language, decide which accent to use:
- Match your primary customer base.
- Use the accent most neutral across regions (harder to define).
- Offer multiple voices if you serve multiple regions.
Language detection
Two approaches:
Upfront selection. Caller picks language via IVR or DTMF at call start. Clean but adds friction.
Auto-detection. Model detects language from the caller's first utterance, responds in that language.
Auto-detection works well for major languages. Falls apart on mixed-language speakers. For B2C with strong language mix, default to explicit selection.
Code-switching
Some speakers mix languages mid-sentence ("envíame el order status please"). Most multilingual LLMs handle this reasonably in understanding but may respond in unexpected languages.
Practical rule: the agent responds in the language the customer predominantly speaks in. If they switch mid-call, the agent can switch too — but gracefully.
Testing plan
Before launching a new language:
- Recruit 5 native speakers.
- Have each place 5 test calls covering your main intents.
- Record everything.
- Review: were responses natural? Was tone right? Did the agent understand accented speech?
- Fix issues; repeat.
Plan 1-2 rounds of iteration before real customer traffic.
Common multilingual pitfalls
Forcing English idioms into the prompt. "Got it" / "sure thing" don't translate. Use native expressions.
Wrong formality register. Using tú (informal) in Spanish business contexts where usted (formal) is expected.
Ignoring number / date format differences. "1,234.56" vs "1.234,56". "3/4/26" meanings differ across regions.
Skipping native QA. English-speaking team can't verify Spanish quality.
Tone drift between languages. English agent sounds warm; Spanish agent sounds stiff. Match the persona across languages.
Per-language metrics
Track metrics per language separately:
- Resolution rate
- CSAT
- Average handle time
- Escalation rate
A language often has meaningfully different metrics. That's useful signal, not noise.
Operational considerations
Escalation paths. If the agent escalates, you need native-speaker humans on the other side. Plan the staffing.
Knowledge base. Translate and localize the KB, not just the prompt. Re-embed.
Compliance. Disclosure language required in the local jurisdiction's language.
Customer comms. Confirmation emails, SMS — all in the right language.
Related reading
- The Definitive Guide to AI Customer Support in 2026
- Building a Tier-1 AI Support Agent Step by Step
- Why "Human-in-the-Loop" Beats "Fully Autonomous" for Most Teams
- How to Calculate ROI for AI Customer Support
- How AI Support Agents Should Handle Account Verification
FAQ
How many languages can one agent handle? Technically unlimited. Operationally, 3-5 is manageable. More than that, consider separate agents per language.
Can I use the same KB for multiple languages? The KB needs to be in the target language. Auto-translation at query time is usually worse than pre-translated.
Will my costs double with a second language? No — the AI platform costs scale with usage, not language count. Your ops costs may scale.
Should I pilot multilingual on a single intent? Yes — same philosophy as any deployment. One intent, one language, prove it, expand.
What about rarely-spoken languages? Quality is materially lower for low-resource languages. Test extensively before launch.

Rohan Pavuluri builds SIMBA Voice Agents at Speechify. Previously, he founded and led Upsolve, the largest nonprofit in the United States serving low-income Americans through technology. He writes about real-world voice-agent deployments — customer support, outbound sales, AI receptionists — and the practical product, design, and operational lessons that actually move the needle.
More from Rohan Pavuluri
View all →SIMBA vs Avoca: Which AI Voice Agent Platform Is Right for Your Service Business?
Avoca raised $125M at a $1B valuation for home services voice AI. SIMBA takes a different approach — horizontal platform, published pricing, IVR navigation, and a dedicated engineer for every customer.
Voice AI for Commercial Real Estate: Leasing, Tenant Services, and Property Operations
Commercial real estate has distinct communication patterns from residential. Voice AI handles leasing inquiries, building ops, CAM questions, and broker qualification across office, retail, and industrial.
Voice Agents for Tenant Communication: Maintenance, Rent, and Lease Management at Scale
Managing tenant communication at scale breaks at about 200 units per property manager. Voice agents handle the entire lifecycle — inquiries, applications, maintenance, rent, renewals, and move-outs.
Related reading
Why "Human-in-the-Loop" Beats "Fully Autonomous" for Most Teams
The fully autonomous AI customer service agent is the AI industry's preferred fantasy. The reality in 2026 is that the best-performing deployments are hybrid: AI handles most volume, humans handle the edge cases and provide supervision, and the line between them is carefully…
How to Calculate ROI for AI Customer Support
ROI calculations for AI customer support often use the wrong baselines and the wrong metrics. The result: numbers that look great in a deck but don't match reality once deployed. The right model captures the full cost and benefit stack, including second-order effects.
Designing AI Agents That Cancel Subscriptions Honestly
Subscription cancellation is a legally loaded support interaction. Several jurisdictions now require cancellation to be as easy as signup ("click-to-cancel" laws).
Voice AI, twice a month.
Get the best of the SIMBA resources hub — new articles, trend notes, and operator guides. No spam.
