Persistent Memory for AI Agents

June 2026 · 7 min read · For developers building with Claude API / Claude Agents

Your AI agent runs a task, does useful work, then exits. Next run: blank slate. It has no memory of what it learned last time, what state it was in, which contacts it already processed, or which decisions it already made.

This is the agent memory problem. It's not a new idea — every serious agent framework mentions it. But most solutions require you to stand up a vector database, wire up embeddings, build a retrieval layer, and manage all of it in production. That's a lot of infrastructure for "remember what you did yesterday."

Stash is a different approach: a hosted key-value + full-text store, accessible to your agent via MCP, with zero infrastructure to manage.

What agents actually need to remember

Most agent memory falls into a few categories:

Category	Example	How long?
Episodic	What happened last run	Hours–days
Procedural	How to handle edge cases	Weeks–months
Semantic	Facts about the domain or users	Long-term
Working	Current task state (resume after crash)	Until done

You don't always need semantic search over all of it. Most agent memory lookups are either "get the most recent state" (episodic) or "find the record about X" (semantic). Stash handles both with context() for standing facts and search() for lookup.

The pattern: agents as MCP clients

Claude agents can call MCP tools. If you wire Stash as an MCP server, your agent gets stash_add, stash_search, context(), and usage() as native tools — no custom code, no API calls, no database queries.

The agent decides what to save. It decides when to load. The memory lives in Stash, survives between runs, and is searchable by text.

# Example: an agent that processes customer requests
# Run 1: Agent processes request, saves outcome

Agent prompt (run 1):
"Process this request from Alex Chen about their order #8821.
When done, use stash_add to save: what you did, the outcome, and any notes
about their account that would be useful next time."

Agent response: "Resolved the refund. Saving to Stash..."
→ stash_add(collection="customer_notes", content="Alex Chen — order 8821
   refund processed 2026-06-08. Preferred contact: email. Noted: sensitive
   about shipping delays, acknowledge proactively next time.")

# Run 2 (a week later): fresh context, but memory survives
Agent prompt (run 2):
"Alex Chen just submitted another request — order #9102."

Agent (calls search first):
→ stash_search("Alex Chen")
← "Alex Chen — order 8821, refund processed. Preferred: email. Sensitive
   about shipping delays..."

Agent: "Found previous history. Alex had a shipping issue before — I'll
acknowledge that upfront before diving into #9102."

The agent gets institutional memory. It gets better over time. And you didn't build a database.

Working state: resuming after failure

Long-running agents (processing a queue, doing research, building a report) crash. When they restart, they start over. With Stash:

# Agent saves checkpoint on each significant step
→ stash_add(collection="agent_state", content=json.dumps({
    "run_id": "run_20260608_001",
    "step": "processed_items_47",
    "last_item_id": "item_1047",
    "results_so_far": [...summary...]
}))

# On restart, agent loads the checkpoint
→ stash_search("run_20260608_001")
← Returns the checkpoint — agent resumes from step 47, not step 0

No Redis. No a database. No custom checkpoint logic beyond "save a record, search for it later."

Comparison: Stash vs. alternatives for agent memory

Option	Setup	Search	Cost/mo	MCP-native?
Stash (free tier)	Paste URL	FTS5 full-text	£0	✓
Stash Pro	Paste URL	FTS5 full-text	£8	✓
Pinecone	Signup + API + embeddings	Vector similarity	$25+	Custom code
Supabase + pgvector	DB setup + schema + embeddings	Vector + queries	$25+	Custom code
a structured store local file	Code + schema	FTS5 (if configured)	$0	No
Mem0	Signup + SDK integration	Semantic	$9+	No

Stash is not the most powerful option. If you need sub-millisecond vector search over 10M embeddings, use Pinecone. But for the common case — agent memory that needs to survive runs and be searchable by text — Stash is the fastest path from zero to working.

Token cost: why MCP memory is cheaper than prompt stuffing

The naive approach to agent memory is "stuff everything into the system prompt." This is expensive and gets worse as memory grows:

# Prompt-stuffing approach (expensive):
system_prompt = f"""
You are an agent. Here is everything you know:
{all_memory_as_text}  # grows without bound, costs tokens every call
"""

# MCP pull approach (cheap):
# Agent starts with minimal context
# Calls stash_search() only when it needs specific memory
# Pays tokens only for what's actually retrieved

At our benchmark: ~192 tokens for a 500-record FTS5 search result vs. ~4,800+ for the same data stuffed in a prompt. When your agent runs hundreds of times, this matters.

Setting it up (2 minutes)

If you're building a Claude agent via the Agents SDK or the Claude API with tool use:

Sign up at stashlite.com → get your connector URL
Add the connector URL as an MCP server in your agent's tool config
The agent now has stash_add, stash_search, context(), and usage() as tools
Instruct the agent to save what matters, search before acting

No schema migration. No embedding pipeline. No infra to run.

If you're using Claude.ai (not the API): The same connector works. Settings → Integrations → Add custom integration → paste the URL. Your agent-style workflows in Claude.ai get the same memory store.

The limits you should know

Stash is intentionally simple. What it doesn't do:

No vector similarity search. It's FTS5 keyword search. If you need "find things semantically similar to this concept," you need a vector store.
No real-time subscriptions. Your agent polls; it doesn't get pushed.
Caps on free tier. 2,500 records / 50 queries per month. Fine for development; paid tier (£8/mo) is 100K/1K.
Single-user. Each account is one user's store. No team sharing in v1.

For production multi-agent systems at scale, evaluate accordingly. For individual agents, side projects, and prototypes: Stash is usually enough, and it's the fastest thing to wire up.

Give your agent memory in 2 minutes

Free tier. No credit card. Connector URL on signup.

Get your connector URL →