Adding a second language to an AI voice agent feels simple on paper — the models support it, the TTS is available, switch a flag. In practice, good multilingual support is a project. Done well, it unlocks new markets. Done poorly, it confuses customers in both languages. This is the practical playbook.

TL;DR

Add a second language when you have clear volume in that language, not speculatively.
Translate the system prompt; don't just feed the English one to a multilingual model.
Native speakers must review the prompt and test real calls.
Expect 2-4 weeks of work for a properly done second language.

When to add a second language

Triggers:

Measurable volume. 10%+ of your customers prefer a language you don't support.

Geographic expansion. New market where the primary language isn't English.

Competitive pressure. Competitors offer multilingual; you need parity.

Regulatory requirement. Some jurisdictions require native-language support.

Don't add languages speculatively. "We might need Spanish someday" isn't a reason to build it now.

What "multilingual" actually requires

Five things:

1. STT in the target language. Your speech recognizer needs to transcribe the language accurately.

2. TTS in the target language. Your text-to-speech needs to speak it naturally.

3. LLM that handles the language well. Most frontier LLMs are strong in major languages; check smaller ones.

4. A translated + localized system prompt. Not just translated — localized. Different greeting conventions, register, politeness norms.

5. Native-speaker testing. Real calls from real native speakers before launch.

Miss any of these and your "multilingual" agent feels broken.

Translating the system prompt

The common mistake: translate English → Spanish with ChatGPT; deploy.

The better approach:

Have a native speaker translate.
Adapt register for the target language (Spanish has tú vs usted; Japanese has formal/informal distinctions; etc.).
Adjust cultural references (idioms, currency format, examples).
Re-test with native callers.

Budget: 4-8 hours of native-speaker work per language.

Picking STT and TTS

For English → Spanish (as an example):

STT:

Deepgram: strong Spanish support, streaming.
AssemblyAI: good.
Whisper (self-hosted): good but slower in streaming mode.

TTS:

Simba Multilingual v2: many Spanish voices, high quality.
OpenAI TTS: multilingual built-in.
Cartesia: growing multilingual support.

Test each on your specific prompts. Quality varies by speaker style, regional accent, and content domain.

Handling accents within a language

Spanish isn't one accent. Mexican, Castilian, Colombian, Argentinian all sound different. Same for:

English (US, UK, Australian, Indian)
French (France, Canadian)
Portuguese (Brazilian, European)

For each language, decide which accent to use:

Match your primary customer base.
Use the accent most neutral across regions (harder to define).
Offer multiple voices if you serve multiple regions.

Language detection

Two approaches:

Upfront selection. Caller picks language via IVR or DTMF at call start. Clean but adds friction.

Auto-detection. Model detects language from the caller's first utterance, responds in that language.

Auto-detection works well for major languages. Falls apart on mixed-language speakers. For B2C with strong language mix, default to explicit selection.

Code-switching

Some speakers mix languages mid-sentence ("envíame el order status please"). Most multilingual LLMs handle this reasonably in understanding but may respond in unexpected languages.

Practical rule: the agent responds in the language the customer predominantly speaks in. If they switch mid-call, the agent can switch too — but gracefully.

Testing plan

Before launching a new language:

Recruit 5 native speakers.
Have each place 5 test calls covering your main intents.
Record everything.
Review: were responses natural? Was tone right? Did the agent understand accented speech?
Fix issues; repeat.

Plan 1-2 rounds of iteration before real customer traffic.

Common multilingual pitfalls

Forcing English idioms into the prompt. "Got it" / "sure thing" don't translate. Use native expressions.

Wrong formality register. Using tú (informal) in Spanish business contexts where usted (formal) is expected.

Ignoring number / date format differences. "1,234.56" vs "1.234,56". "3/4/26" meanings differ across regions.

Skipping native QA. English-speaking team can't verify Spanish quality.

Tone drift between languages. English agent sounds warm; Spanish agent sounds stiff. Match the persona across languages.

Per-language metrics

Track metrics per language separately:

Resolution rate
CSAT
Average handle time
Escalation rate

A language often has meaningfully different metrics. That's useful signal, not noise.

Operational considerations

Escalation paths. If the agent escalates, you need native-speaker humans on the other side. Plan the staffing.

Knowledge base. Translate and localize the KB, not just the prompt. Re-embed.

Compliance. Disclosure language required in the local jurisdiction's language.

Customer comms. Confirmation emails, SMS — all in the right language.

FAQ

How many languages can one agent handle? Technically unlimited. Operationally, 3-5 is manageable. More than that, consider separate agents per language.

Can I use the same KB for multiple languages? The KB needs to be in the target language. Auto-translation at query time is usually worse than pre-translated.

Will my costs double with a second language? No — the AI platform costs scale with usage, not language count. Your ops costs may scale.

Should I pilot multilingual on a single intent? Yes — same philosophy as any deployment. One intent, one language, prove it, expand.

What about rarely-spoken languages? Quality is materially lower for low-resource languages. Test extensively before launch.

Multilingual Support: When and How to Add a Second Language

TL;DR

When to add a second language

What "multilingual" actually requires

Translating the system prompt

Picking STT and TTS

Handling accents within a language

Language detection

Code-switching

Testing plan

Common multilingual pitfalls

Per-language metrics

Operational considerations

FAQ

More from Rohan Pavuluri

SIMBA vs Avoca: Which AI Voice Agent Platform Is Right for Your Service Business?

Voice AI for Commercial Real Estate: Leasing, Tenant Services, and Property Operations

Voice Agents for Tenant Communication: Maintenance, Rent, and Lease Management at Scale

Related reading

Why "Human-in-the-Loop" Beats "Fully Autonomous" for Most Teams

How to Calculate ROI for AI Customer Support

Designing AI Agents That Cancel Subscriptions Honestly

Voice AI, twice a month.