🧠 Conversational AI & LLMs

Multi-Agent Architectures for Customer Service

When a single agent gets too complex — too many intents, too many tools, conflicting style requirements — teams reach for multi-agent architectures. A "router" or "supervisor" routes turns to specialized sub-agents (a billing expert, a tech support expert, a returns expert).

Tyler Weitzman
Tyler Weitzman
January 20, 2026 · 5 min read
Speechify

When a single agent gets too complex — too many intents, too many tools, conflicting style requirements — teams reach for multi-agent architectures. A "router" or "supervisor" routes turns to specialized sub-agents (a billing expert, a tech support expert, a returns expert). The pattern works but is more complex than a single well-designed agent. Worth understanding when to use and when to skip.

TL;DR

  • Multi-agent architectures route turns to specialized sub-agents based on intent.
  • They help when a single agent can't hold all the policies and tools without confusion.
  • They add latency and operational complexity. Don't reach for them prematurely.
  • The right time: when your single agent's prompt has grown past 4,000 tokens AND quality is suffering.

What multi-agent looks like

The basic shape:

Caller → Router agent → "This is a billing question" →
  Billing sub-agent → handles turn → routes back to router

Caller → Router agent → "This is a tech support question" →
  Tech support sub-agent → handles turn

Each sub-agent has its own:

  • System prompt focused on its domain
  • Tool set scoped to its tasks
  • Knowledge base subset

The router decides which sub-agent gets each turn (or each call segment).

Why teams reach for this

Three triggers:

Conflicting prompt rules. The billing agent should be careful and verify; the sales agent should be warm and persuasive. Putting both in one prompt creates conflicting instructions.

Tool count explosion. When your agent has 30+ tools, the LLM struggles to pick the right one. Splitting reduces the choice space per agent.

Knowledge base overload. A single agent retrieving from a 50,000-doc knowledge base produces noisier results than 5 agents each retrieving from 10,000 focused docs.

If you're hitting any of these, multi-agent might help.

Why teams should be cautious

Multi-agent architectures add real cost:

Routing latency. Every turn pays for the router's decision before the actual work starts. 200–500ms added per turn.

Coordination complexity. Sub-agents need to share context. The orchestration layer becomes its own product.

Eval explosion. Now you have N+1 prompts to test (router + sub-agents). Eval cost multiplies.

Failure modes. Router picks the wrong sub-agent. Sub-agent doesn't know context from the previous sub-agent. The complexity surfaces in subtle bugs.

For most use cases, a single well-designed agent with 800–2000 token prompt and 5–15 tools is better than splitting.

When multi-agent is the right call

Reasonable triggers:

  • Single agent prompt is over 4,000 tokens AND eval scores are suffering.
  • More than 25 tools, and the model is confused about which to call.
  • Distinct sub-domains with truly conflicting requirements.
  • Multilingual deployments where each language has its own agent.

If you're nodding at all of these, consider it.

Architecture patterns

Three common shapes:

Router with handoff

The router decides which sub-agent gets the call, then hands off entirely. Sub-agent runs the rest of the call.

Pros: simple, fast (no router involvement after handoff). Cons: hard to switch sub-agents mid-call.

Router per turn

The router decides which sub-agent handles each turn. Sub-agents share context.

Pros: flexible mid-call. Cons: latency on every turn; harder to maintain consistent voice/persona.

Supervisor + workers

A supervisor agent talks to the caller; worker agents are tools the supervisor can call. Workers don't talk to the caller directly.

Pros: clean abstraction; workers can be specialized without persona issues. Cons: more layers; supervisor needs to be smart enough to coordinate.

For most multi-agent voice deployments, supervisor + workers is the cleanest pattern.

Implementation considerations

A few practical things:

Shared memory. Sub-agents need to know what's already been said. Either share the full transcript or summarize for hand-offs.

Consistent persona. If sub-agents have different voices, the call feels disjointed. Use the same TTS voice across sub-agents.

Clean transitions. When switching sub-agents (or worker calls), don't say "transferring you internally." The caller shouldn't experience the seam.

Logging. Tag every turn with which sub-agent handled it. Critical for debugging.

What it costs

Concrete cost comparison for a 3-minute call:

  • Single agent: 1 LLM call per turn × 8 turns = 8 LLM calls.
  • Multi-agent (router per turn): 2 LLM calls per turn × 8 turns = 16 LLM calls.
  • Multi-agent (supervisor + workers): variable; supervisor is always called, workers when needed.

Roughly 30–80% more LLM cost for multi-agent. Worth it if quality wins are clear.

Don't multi-agent on day one

A common mistake: greenfield teams designing multi-agent from the start. The reasoning ("we'll need it eventually") is sympathetic but wrong.

Build a single agent. Get it to production. Iterate. When you genuinely hit the constraints that motivate multi-agent, then split.

You'll learn things about your use case in the first single-agent build that change what your multi-agent should look like.

A simpler alternative

Before going multi-agent, try:

Tighter single-agent prompt. Often you can cut a 4,000-token prompt to 2,000 by removing redundancy.

Better tool descriptions. A confused tool selection often comes from vague descriptions, not too many tools.

Conditional rules in the prompt. "If the caller mentions billing, prioritize the billing tools and skip the rest." Single agent that adapts.

These usually solve 70% of the cases that seem to need multi-agent.

FAQ

How many sub-agents is too many? More than 5–7 starts to be more complexity than benefit. If you need more, your problem is probably bigger than multi-agent solves.

Can sub-agents be different LLMs? Yes — and often should be. Use a fast small model for simple routing, a larger one for complex reasoning sub-agents.

Does the user notice multi-agent vs single-agent? If done well, no. Done poorly, the seams show (different voice, lost context).

What about agent-to-agent communication protocols? Industry standards (Anthropic's MCP, OpenAI's Swarm) are emerging. For most production deployments, in-house orchestration is fine.

How do I evaluate a multi-agent system? Test the router separately (intent accuracy) and each sub-agent separately (their own rubrics) plus end-to-end calls.

Tyler Weitzman
Tyler Weitzman
Co-Founder & Head of AI, Speechify

Tyler Weitzman is co-founder and Head of AI at Speechify. He has spent the past decade building the speech-synthesis stack that powers millions of users. Tyler writes about the engineering of real-time conversational systems — text-to-speech, speech recognition, latency budgets, model serving, and the architectural choices that separate prototypes from production-grade voice agents.

More from Tyler Weitzman

View all →

Related reading

Voice AI, twice a month.

Get the best of the SIMBA resources hub — new articles, trend notes, and operator guides. No spam.