🔌 Integrations & Telephony

How to Integrate Voice Agents with a Custom REST API

Most voice agent integrations are with off-the-shelf systems — Salesforce, HubSpot, Zendesk, Stripe. But eventually every production deployment needs to integrate with a custom internal API — the billing system, the proprietary order management, the ops dashboard that only your…

Tyler Weitzman
Tyler Weitzman
March 31, 2026 · 7 min read
Speechify

Most voice agent integrations are with off-the-shelf systems — Salesforce, HubSpot, Zendesk, Stripe. But eventually every production deployment needs to integrate with a custom internal API — the billing system, the proprietary order management, the ops dashboard that only your company has. Doing this right is the difference between a voice agent that does what your business actually needs and one that stops short at what's in the integration marketplace. Fortunately, custom REST API integration is one of the most straightforward parts of voice agent engineering.

TL;DR

  • Model each API operation as an LLM function the agent can call.
  • Authenticate per API's requirements (OAuth, API keys, JWT, etc.).
  • Handle errors, retries, and latency explicitly.
  • Validate input and output — don't trust the LLM's generated arguments blindly.
  • Instrument with logs and metrics for debugging.

The pattern

Voice agent integrations with custom APIs follow a standard pattern:

  1. Define the function — the LLM-facing description of what the action does.
  2. Define the schema — arguments the LLM can provide.
  3. Implement the handler — your code that makes the HTTP request.
  4. Handle responses — parse, validate, transform for the LLM.
  5. Return to the LLM — response fed back into the conversation.

See function calling for voice agents: a practical guide.

Example: custom billing API

Suppose your billing system has:

  • GET /customers/{id} — fetch customer.
  • GET /invoices?customer_id={id}&status=open — list open invoices.
  • POST /payments — create a payment.

Voice agent integration:

Function definition for LLM:

{
  "name": "lookup_customer_invoices",
  "description": "Look up open invoices for a customer by phone or email",
  "parameters": {
    "type": "object",
    "properties": {
      "phone": {"type": "string"},
      "email": {"type": "string"}
    }
  }
}

Handler:

def lookup_customer_invoices(phone=None, email=None):
    # Find customer
    customer = call_billing_api("GET", "/customers/search", {
        "phone": phone, "email": email
    })
    if not customer:
        return {"error": "customer_not_found"}
    
    # List invoices
    invoices = call_billing_api("GET", f"/invoices?customer_id={customer['id']}&status=open")
    
    return {
        "customer_name": customer["name"],
        "invoice_count": len(invoices),
        "total_outstanding": sum(i["amount"] for i in invoices),
        "invoices": [{"id": i["id"], "amount": i["amount"], "due_date": i["due_date"]} for i in invoices]
    }

Handler for payment:

def create_payment(customer_id, invoice_id, amount, payment_method_id):
    response = call_billing_api("POST", "/payments", {
        "customer_id": customer_id,
        "invoice_id": invoice_id,
        "amount": amount,
        "payment_method_id": payment_method_id,
        "idempotency_key": f"call_{current_call_id}_invoice_{invoice_id}"
    })
    return {
        "success": response["status"] == "completed",
        "confirmation_number": response.get("confirmation_number"),
        "error": response.get("error_message")
    }

Authentication

Support whatever your API requires:

  • API keys in headers — simplest, least secure.
  • OAuth 2.0 client credentials — for service-to-service.
  • JWT — for signed, short-lived tokens.
  • mTLS — for sensitive internal APIs.

Store secrets securely (vault, secret manager). Never hardcode.

Error handling

APIs fail. Your handlers must handle:

  • Network errors. Timeout, connection refused. Retry with backoff.
  • 4xx errors. Bad request, not found, unauthorized. Don't retry; return clean error to LLM.
  • 5xx errors. Server error, service unavailable. Retry with backoff.
  • Rate limiting (429). Respect Retry-After header.

Example:

def call_billing_api(method, path, params=None, retries=3):
    for attempt in range(retries):
        try:
            response = http_request(method, base_url + path, params)
            if response.status == 429:
                sleep(int(response.headers.get("Retry-After", 1)))
                continue
            if 400 <= response.status < 500:
                return {"error": "client_error", "details": response.body}
            if 500 <= response.status:
                sleep(exponential_backoff(attempt))
                continue
            return response.json
        except TimeoutError:
            sleep(exponential_backoff(attempt))
    return {"error": "max_retries_exceeded"}

Latency awareness

Voice is real-time. Slow APIs hurt.

  • Cap timeouts aggressively (5–10 seconds max).
  • Cache reads where safe.
  • Pre-fetch context at call start (parallel with greeting).
  • Async writes where the caller doesn't need immediate confirmation.
  • Graceful degradation if API is slow.

If your API can't respond in time, either cache aggressively or plan a graceful fallback ("let me take your info and call you back").

Input validation

The LLM generates arguments. Sometimes it gets creative. Validate before calling your API:

  • Type checking — amount is a number.
  • Range checking — amount > 0 and < reasonable max.
  • Format checking — email is an email, phone is E.164.
  • Business rule checking — refund amount doesn't exceed original payment.

Reject invalid arguments before hitting your API. Better error message, less backend noise.

Response transformation

The API's raw response might not be LLM-friendly. Transform:

  • Flatten deeply nested responses.
  • Convert technical codes to human-readable strings ("CS_404" → "customer not found").
  • Strip irrelevant fields (PII, internal IDs the LLM doesn't need).
  • Summarize lists (don't return all 50 invoices; return top 5 and a count).
# Before transformation (raw API response):
{"status": 200, "data": {"customer": {"id": "...", "internal_flags": [...], ...}}}

# After transformation (LLM-friendly):
{"customer_name": "Jamie Patel", "status": "active", "account_tier": "premium"}

Logging and observability

Every API call should produce:

  • Request log (sanitized).
  • Response log (sanitized).
  • Latency.
  • Success/failure.
  • Retries.

Aggregate:

  • Success rate per endpoint.
  • Latency distribution.
  • Error rate by type.

Integrate with your observability platform (Datadog, Grafana, whatever).

Security

  • Secrets never in code. Use a secret manager.
  • Least-privilege API keys. Each voice agent integration should have a key scoped to what it needs, no more.
  • Audit logs of API usage.
  • Rotate keys periodically.
  • Monitor for anomalous usage.

Rate limiting and throttling

Your API has limits. Respect them:

  • Client-side rate limiting — don't overwhelm your own API.
  • Backoff on 429 — respect Retry-After.
  • Circuit breakers — temporarily stop calling an API if it's failing.
  • Quotas per integration — prevent one voice agent runaway from hammering.

Webhooks from your API back to voice agent

Sometimes your API needs to notify the voice agent asynchronously:

  • Payment confirmed (after external processing).
  • Appointment confirmed by provider.
  • Status changed.

Inbound webhook from your API → voice agent system → updates call context or triggers follow-up.

See webhooks 101 for voice agents.

Versioning

APIs evolve. Voice agent integrations need to survive:

  • Pin to specific API version (URL path or header).
  • Monitor for deprecations.
  • Test upgrades in non-production.
  • Stage rollouts of API version changes.

Testing

  • Unit tests for each handler.
  • Integration tests against a staging API.
  • Mock server for end-to-end tests.
  • Error path tests — what happens on 500? 429? timeout?

Documentation

Keep documentation of:

  • What each function does.
  • When the voice agent calls it.
  • Expected inputs and outputs.
  • Error modes.
  • Example interactions.

This helps future-you and your team.

Example end-to-end

Caller: "I want to pay my bill."

Agent: "Sure, let me look that up. Can I get your phone number?"

Caller: "555-123-4567."

[LLM calls lookup_customer_invoices(phone="+15551234567")]

[Handler calls GET /customers/search, then GET /invoices?customer_id=...]

[Returns: customer_name="Jamie", invoice_count=1, total_outstanding=247.00]

Agent: "I see a $247 open invoice from March 1st. Want to pay it now with the card on file?"

Caller: "Yes."

[LLM calls create_payment(invoice_id="inv_...", amount=24700, payment_method_id="pm_...")]

[Handler calls POST /payments with idempotency key]

[Returns: success=true, confirmation_number="PAY-8472"]

Agent: "Payment of $247 confirmed, confirmation number PAY-8472. Anything else?"

Clean integration, clean caller experience.

FAQ

What if our API is GraphQL? Same pattern — define functions that execute specific queries/mutations.

What about legacy SOAP APIs? Wrap them in a REST shim. Expose clean REST to the voice agent.

Can we use the voice agent's LLM to generate API calls dynamically? Discouraged. Predictable function schemas are safer than letting the LLM author arbitrary HTTP.

What about streaming APIs? For long-running operations, use async patterns — voice agent initiates, waits for webhook callback.

How do we handle multi-tenant APIs? Per-tenant configuration. Voice agent's context determines which credentials to use.

Tyler Weitzman
Tyler Weitzman
Co-Founder & Head of AI, Speechify

Tyler Weitzman is co-founder and Head of AI at Speechify. He has spent the past decade building the speech-synthesis stack that powers millions of users. Tyler writes about the engineering of real-time conversational systems — text-to-speech, speech recognition, latency budgets, model serving, and the architectural choices that separate prototypes from production-grade voice agents.

More from Tyler Weitzman

View all →

Related reading

Voice AI, twice a month.

Get the best of the SIMBA resources hub — new articles, trend notes, and operator guides. No spam.