By default, voice agents have no memory beyond the current call. The caller hangs up, the agent forgets everything. For many use cases this is fine. For loyalty-driven businesses where the same caller comes back repeatedly, it's a missed opportunity. Long-term memory is doable but doesn't come free — there are real design choices to make.

TL;DR

Voice agents have three memory layers: in-turn (the prompt), in-call (the running transcript), and cross-call (persistent storage).
Cross-call memory is what most teams want when they say "memory."
Don't try to remember everything; remember what's actionable.
The hard part isn't storing memory — it's surfacing the right memory at the right moment.

The three memory layers

In-turn. The prompt + tool schemas + retrieved context that goes to the LLM on each turn. Ephemeral; reset every turn.

In-call. The running transcript that accumulates during a single call. Lasts until the call ends.

Cross-call. Persistent storage that links to the caller's identity. Looks up "have I talked to this person before?" on call start.

Most "memory" projects are about the third layer.

Why bother with cross-call memory

A few use cases where it matters:

Loyalty / repeat callers. "Hi Sarah, calling about the same issue as last week?" feels much better than starting fresh every time.

Healthcare. Knowing the patient has called twice before about the same prescription issue affects how the agent handles the third call.

B2B account management. The agent knows which products the caller uses, what they last asked about.

Concierge services. "Same usual time?" instead of full discovery.

For one-off transactional calls (order status, password reset), cross-call memory is overkill.

What to store

Resist the temptation to store everything. Most useful patterns:

Caller identity. Phone number, name, account ID. The minimum to recognize the caller next time.

Recent intents. What the caller has called about in the last few weeks.

Resolved issues. What got handled, what didn't.

Preferences. Time preferences, channel preferences, language.

Open commitments. Things the agent or business promised to do.

What NOT to store: full transcripts of every call (too noisy), sensitive PII without need (compliance risk), or "personality notes" the agent has guessed about the caller (creepy).

How to surface the right memory

The hard part. You don't want to dump 5 calls of history into the prompt — that's expensive and confuses the model. You want to surface only what's relevant to this call.

Patterns:

Recent summary. A 2–3 sentence summary of the caller's last 1–3 interactions. Always included.

Conditional recall. If the caller mentions a past issue, the agent can call a lookup_past_interactions(query) function to pull relevant context on demand.

Open thread alert. "This caller has an unresolved ticket from 4 days ago." Surfaced at call start.

Preference signals. "Caller prefers afternoon appointments." Used to bias the agent's suggestions without making the recall explicit.

The privacy angle

Cross-call memory is also a privacy responsibility. Best practices:

Tell users you remember. "I remember our last conversation about your blood pressure medication." Don't surprise them.

Let them opt out. "Want me to forget our previous conversations?" should be supported.

Don't surface sensitive details unprompted. Just because you remember a customer's complaint doesn't mean you should bring it up next call.

Comply with deletion requests. When a customer asks to be forgotten, actually delete the memory store, not just the call recordings.

Implementation

Three architectural patterns:

1. Summary-based memory

After each call, the agent generates a 2-3 sentence summary. Stored in your database keyed by caller ID. Next call, the summary gets prepended to the system prompt.

Pros: simple, cheap, easy to debug. Cons: information loss; summaries get stale.

2. Vector-based memory

After each call, key facts get extracted and embedded. Stored in a vector DB. Next call, the agent can retrieve relevant facts via similarity search.

Pros: scales to many calls; precise recall. Cons: complexity; quality depends on extraction.

3. Structured memory

After each call, structured fields get updated (preferred contact method, last issue category, last resolution). Looked up by ID.

Pros: clean, queryable, easy to surface. Cons: requires schema design upfront.

In practice, most production memory systems combine all three.

What about within-call memory?

For long calls (10+ minutes), in-call memory matters too. The transcript can blow past the LLM's effective context window or just bury important info.

Solutions:

Sliding window (keep last N turns verbatim, summarize older turns).
Periodic re-summarization at every 10 turns.
Structured slot tracking (key facts captured to dedicated state).

For more on this, see building a conversation memory layer for voice agents.

When to skip cross-call memory

A few contexts where skipping memory is the right call:

One-off transactional calls
High-volume B2C support where most callers are anonymous
Use cases where compliance makes long-term storage risky
Pilots — add memory later if needed

FAQ

How long should I retain memory? Match your business need. 90 days is common for support. 1 year for loyalty programs. Always document and disclose retention.

Does memory require a database? For real cross-call memory, yes. Per-call summaries can live in the same DB you use for call logs.

Can the model remember without my building anything? No. LLMs have no persistent memory outside what you put in the prompt.

What if the caller is anonymous? You can still build memory keyed on phone number — most callers don't realize the agent recognizes them by their caller ID.

Is this GDPR-compliant? Depends on what you store and your disclosures. Talk to legal. The user has the right to see what's stored and to request deletion.

How to Give a Voice Agent Long-Term Memory

TL;DR

The three memory layers

Why bother with cross-call memory

What to store

How to surface the right memory

The privacy angle

Implementation

1. Summary-based memory

2. Vector-based memory

3. Structured memory

What about within-call memory?

When to skip cross-call memory

FAQ

More from Tyler Weitzman

Open-Source vs Proprietary Voice Agent Stacks

Build vs Buy: When to Build Your Own Voice Agent

Voice Agents for Developer Support

Related reading

Building a Conversation Memory Layer for Voice Agents

The Role of Embeddings in Voice Agent Knowledge

Designing Voice Agents That Ask Better Questions

Voice AI, twice a month.