The model has no memory beyond what you put in its context window. For a 5-minute support call this is fine. For longer calls, multi-call interactions, or agents that need to remember preferences across sessions, you need an explicit memory layer. The shape of that layer is more interesting than it first looks.

TL;DR

Memory has three scopes: in-turn (the prompt), in-call (the running transcript), cross-call (persistent storage).
For long calls, sliding window + periodic summarization beats trying to fit everything.
For cross-call memory, key on the caller's stable identifier (phone number, account ID).
Don't store everything. Store what's actionable.

The three scopes

In-turn. The contents of the LLM's prompt for a single turn. Includes system prompt, recent conversation history, retrieved RAG, and function results. Ephemeral.

In-call. State that persists across turns within a single call. Most platforms include this by default — the running transcript flows into each turn's prompt.

Cross-call. State that persists across separate calls from the same caller. Requires a database keyed on caller identity.

The scopes are layered: in-turn pulls from in-call which pulls from cross-call.

In-call memory tactics

For most calls (under 15 turns, under 10 minutes), the default in-call memory is fine. The full transcript fits in the prompt; the model can reason over the whole thing.

For longer calls, three patterns:

Sliding window

Keep the last N turns verbatim. Drop or summarize older ones.

[System prompt]
[Older turns summarized: "Caller introduced themselves as
Sarah, asked about her order #4521. Confirmed shipping
delay. Discussed compensation options."]
[Last 6 turns verbatim]
[Current turn]

Pros: bounded prompt size; preserves recent detail. Cons: loses precise older context; older summary can drift.

Periodic re-summarization

Every N turns, the orchestration layer asks the model to re-summarize the call so far. The summary replaces the full older transcript.

Pros: summaries get refined as the call progresses. Cons: extra LLM cost; summary quality varies.

Structured slot tracking

Maintain explicit fields for important info captured during the call:

{
  caller_name: "Sarah Chen",
  account_id: "1976432",
  intent: "shipping_delay_complaint",
  resolution_status: "in_progress",
  promised_action: "supervisor will call back tomorrow"
}

Inject into the prompt instead of relying on the model to remember.

Pros: precise; queryable; good for evals. Cons: requires schema design; manual upkeep.

In practice, most production agents combine sliding window + structured slots.

Cross-call memory

When the same caller comes back, ideally the agent picks up where they left off. Implementation:

Identify the caller. Phone number is the simplest key. Account ID if you have it.

Store call summaries. After each call, generate a 2–3 sentence summary; persist keyed on caller ID.

Surface relevant memory at call start. Pull recent summaries; inject into the system prompt for the new call.

Update on new info. As the new call progresses, update preferences, resolved issues, and any ongoing threads.

For the deeper take, see how to give a voice agent long-term memory.

What to remember

Resist storing everything. Focus on:

Caller identity (name, contact info)
Recent intents (what they called about)
Open commitments (things you promised to do)
Strong preferences (time, channel, style)
Resolved issues (so the agent doesn't ask again)

Don't store:

Full transcripts of every call (too noisy; use summaries)
Sensitive PII not needed for the task
Agent's interpretation of caller's mood (creepy and often wrong)

The privacy angle

Cross-call memory is a privacy commitment. Best practices:

Disclose. Tell users you remember. "I see we talked last week about your prescription."

Allow opt-out. "Want me to forget our previous conversations?"

Honor deletion. When a user requests deletion, actually delete from the memory store, not just the transcript.

Limit retention. 90 days for most use cases. Longer for loyalty programs with explicit consent.

Implementation choices

A few real options:

Database-backed structured store. Postgres or similar. Best for structured slots, customer profiles, preferences.

Vector-backed memory. Embed past interactions; retrieve relevant ones at call start. Good for unstructured "things that came up before."

Hybrid. Structured fields in Postgres + vector recall for fuzzier patterns. Most scalable.

For first builds, start with database-backed. Add vector recall when you have enough interaction history to make it useful.

Eval for memory

Memory layers add complexity; they need their own evals.

Test cases:

Caller calls back about the same issue. Does the agent recognize it?
Caller calls about a new issue. Does the agent avoid bringing up irrelevant old context?
Caller asks the agent to forget. Does it actually forget?
Caller's preferences change. Does the memory update?

Run these on every memory layer change.

When skipping memory is right

A few cases where you should not build cross-call memory:

One-off transactional calls (order status, password reset).
High-volume B2C anonymous calls.
Use cases where compliance makes long retention risky.
First builds — get the agent shipping; add memory later if needed.

FAQ

How long should call summaries be? 2–3 sentences. Captures the essentials without bloating future prompts.

Can the model write its own summary at call end? Yes — most platforms do this automatically. Just prompt the model to "write a 2-sentence summary of this call: ..."

What if my use case has no caller identity? Skip cross-call memory. In-call memory is still useful.

Does memory hurt latency? Marginally — adds 100–200 tokens to the prompt. Negligible compared to other latency drivers.

What about adversarial prompt injection through memory? A real risk. Sanitize stored summaries; don't let user input flow into future prompts unfiltered.

Building a Conversation Memory Layer for Voice Agents

TL;DR

The three scopes

In-call memory tactics

Sliding window

Periodic re-summarization

Structured slot tracking

Cross-call memory

What to remember

The privacy angle

Implementation choices

Eval for memory

When skipping memory is right

FAQ

More from Tyler Weitzman

Open-Source vs Proprietary Voice Agent Stacks

Build vs Buy: When to Build Your Own Voice Agent

Voice Agents for Developer Support

Related reading

The Role of Embeddings in Voice Agent Knowledge

How to Give a Voice Agent Long-Term Memory

Designing Voice Agents That Ask Better Questions

Voice AI, twice a month.