💬 Customer Support Automation

Multilingual Support: When and How to Add a Second Language

Adding a second language to an AI voice agent feels simple on paper — the models support it, the TTS is available, switch a flag. In practice, good multilingual support is a project. Done well, it unlocks new markets. Done poorly, it confuses customers in both languages.

Rohan Pavuluri
Rohan Pavuluri
February 7, 2026 · 5 min read
Speechify

Adding a second language to an AI voice agent feels simple on paper — the models support it, the TTS is available, switch a flag. In practice, good multilingual support is a project. Done well, it unlocks new markets. Done poorly, it confuses customers in both languages. This is the practical playbook.

TL;DR

  • Add a second language when you have clear volume in that language, not speculatively.
  • Translate the system prompt; don't just feed the English one to a multilingual model.
  • Native speakers must review the prompt and test real calls.
  • Expect 2-4 weeks of work for a properly done second language.

When to add a second language

Triggers:

Measurable volume. 10%+ of your customers prefer a language you don't support.

Geographic expansion. New market where the primary language isn't English.

Competitive pressure. Competitors offer multilingual; you need parity.

Regulatory requirement. Some jurisdictions require native-language support.

Don't add languages speculatively. "We might need Spanish someday" isn't a reason to build it now.

What "multilingual" actually requires

Five things:

1. STT in the target language. Your speech recognizer needs to transcribe the language accurately.

2. TTS in the target language. Your text-to-speech needs to speak it naturally.

3. LLM that handles the language well. Most frontier LLMs are strong in major languages; check smaller ones.

4. A translated + localized system prompt. Not just translated — localized. Different greeting conventions, register, politeness norms.

5. Native-speaker testing. Real calls from real native speakers before launch.

Miss any of these and your "multilingual" agent feels broken.

Translating the system prompt

The common mistake: translate English → Spanish with ChatGPT; deploy.

The better approach:

  1. Have a native speaker translate.
  2. Adapt register for the target language (Spanish has tú vs usted; Japanese has formal/informal distinctions; etc.).
  3. Adjust cultural references (idioms, currency format, examples).
  4. Re-test with native callers.

Budget: 4-8 hours of native-speaker work per language.

Picking STT and TTS

For English → Spanish (as an example):

STT:

  • Deepgram: strong Spanish support, streaming.
  • AssemblyAI: good.
  • Whisper (self-hosted): good but slower in streaming mode.

TTS:

  • Simba Multilingual v2: many Spanish voices, high quality.
  • OpenAI TTS: multilingual built-in.
  • Cartesia: growing multilingual support.

Test each on your specific prompts. Quality varies by speaker style, regional accent, and content domain.

Handling accents within a language

Spanish isn't one accent. Mexican, Castilian, Colombian, Argentinian all sound different. Same for:

  • English (US, UK, Australian, Indian)
  • French (France, Canadian)
  • Portuguese (Brazilian, European)

For each language, decide which accent to use:

  • Match your primary customer base.
  • Use the accent most neutral across regions (harder to define).
  • Offer multiple voices if you serve multiple regions.

Language detection

Two approaches:

Upfront selection. Caller picks language via IVR or DTMF at call start. Clean but adds friction.

Auto-detection. Model detects language from the caller's first utterance, responds in that language.

Auto-detection works well for major languages. Falls apart on mixed-language speakers. For B2C with strong language mix, default to explicit selection.

Code-switching

Some speakers mix languages mid-sentence ("envíame el order status please"). Most multilingual LLMs handle this reasonably in understanding but may respond in unexpected languages.

Practical rule: the agent responds in the language the customer predominantly speaks in. If they switch mid-call, the agent can switch too — but gracefully.

Testing plan

Before launching a new language:

  1. Recruit 5 native speakers.
  2. Have each place 5 test calls covering your main intents.
  3. Record everything.
  4. Review: were responses natural? Was tone right? Did the agent understand accented speech?
  5. Fix issues; repeat.

Plan 1-2 rounds of iteration before real customer traffic.

Common multilingual pitfalls

Forcing English idioms into the prompt. "Got it" / "sure thing" don't translate. Use native expressions.

Wrong formality register. Using tú (informal) in Spanish business contexts where usted (formal) is expected.

Ignoring number / date format differences. "1,234.56" vs "1.234,56". "3/4/26" meanings differ across regions.

Skipping native QA. English-speaking team can't verify Spanish quality.

Tone drift between languages. English agent sounds warm; Spanish agent sounds stiff. Match the persona across languages.

Per-language metrics

Track metrics per language separately:

  • Resolution rate
  • CSAT
  • Average handle time
  • Escalation rate

A language often has meaningfully different metrics. That's useful signal, not noise.

Operational considerations

Escalation paths. If the agent escalates, you need native-speaker humans on the other side. Plan the staffing.

Knowledge base. Translate and localize the KB, not just the prompt. Re-embed.

Compliance. Disclosure language required in the local jurisdiction's language.

Customer comms. Confirmation emails, SMS — all in the right language.

FAQ

How many languages can one agent handle? Technically unlimited. Operationally, 3-5 is manageable. More than that, consider separate agents per language.

Can I use the same KB for multiple languages? The KB needs to be in the target language. Auto-translation at query time is usually worse than pre-translated.

Will my costs double with a second language? No — the AI platform costs scale with usage, not language count. Your ops costs may scale.

Should I pilot multilingual on a single intent? Yes — same philosophy as any deployment. One intent, one language, prove it, expand.

What about rarely-spoken languages? Quality is materially lower for low-resource languages. Test extensively before launch.

Rohan Pavuluri
Rohan Pavuluri
Building SIMBA Voice Agents

Rohan Pavuluri builds SIMBA Voice Agents at Speechify. Previously, he founded and led Upsolve, the largest nonprofit in the United States serving low-income Americans through technology. He writes about real-world voice-agent deployments — customer support, outbound sales, AI receptionists — and the practical product, design, and operational lessons that actually move the needle.

More from Rohan Pavuluri

View all →

Related reading

Voice AI, twice a month.

Get the best of the SIMBA resources hub — new articles, trend notes, and operator guides. No spam.