Persistent Memory for AI Agents

June 2026 · 7 min read · For developers building with Claude API / Claude Agents

Your AI agent runs a task, does useful work, then exits. Next run: blank slate. It has no memory of what it learned last time, what state it was in, which contacts it already processed, or which decisions it already made.

This is the agent memory problem. It's not a new idea — every serious agent framework mentions it. But most solutions require you to stand up a vector database, wire up embeddings, build a retrieval layer, and manage all of it in production. That's a lot of infrastructure for "remember what you did yesterday."

Stash is a different approach: a hosted key-value + full-text store, accessible to your agent via MCP, with zero infrastructure to manage.

What agents actually need to remember

Most agent memory falls into a few categories:

CategoryExampleHow long?
EpisodicWhat happened last runHours–days
ProceduralHow to handle edge casesWeeks–months
SemanticFacts about the domain or usersLong-term
WorkingCurrent task state (resume after crash)Until done

You don't always need semantic search over all of it. Most agent memory lookups are either "get the most recent state" (episodic) or "find the record about X" (semantic). Stash handles both with context() for standing facts and search() for lookup.

The pattern: agents as MCP clients

Claude agents can call MCP tools. If you wire Stash as an MCP server, your agent gets stash_add, stash_search, context(), and usage() as native tools — no custom code, no API calls, no database queries.

The agent decides what to save. It decides when to load. The memory lives in Stash, survives between runs, and is searchable by text.

# Example: an agent that processes customer requests
# Run 1: Agent processes request, saves outcome

Agent prompt (run 1):
"Process this request from Alex Chen about their order #8821.
When done, use stash_add to save: what you did, the outcome, and any notes
about their account that would be useful next time."

Agent response: "Resolved the refund. Saving to Stash..."
→ stash_add(collection="customer_notes", content="Alex Chen — order 8821
   refund processed 2026-06-08. Preferred contact: email. Noted: sensitive
   about shipping delays, acknowledge proactively next time.")

# Run 2 (a week later): fresh context, but memory survives
Agent prompt (run 2):
"Alex Chen just submitted another request — order #9102."

Agent (calls search first):
→ stash_search("Alex Chen")
← "Alex Chen — order 8821, refund processed. Preferred: email. Sensitive
   about shipping delays..."

Agent: "Found previous history. Alex had a shipping issue before — I'll
acknowledge that upfront before diving into #9102."

The agent gets institutional memory. It gets better over time. And you didn't build a database.

Working state: resuming after failure

Long-running agents (processing a queue, doing research, building a report) crash. When they restart, they start over. With Stash:

# Agent saves checkpoint on each significant step
→ stash_add(collection="agent_state", content=json.dumps({
    "run_id": "run_20260608_001",
    "step": "processed_items_47",
    "last_item_id": "item_1047",
    "results_so_far": [...summary...]
}))

# On restart, agent loads the checkpoint
→ stash_search("run_20260608_001")
← Returns the checkpoint — agent resumes from step 47, not step 0

No Redis. No a database. No custom checkpoint logic beyond "save a record, search for it later."

Comparison: Stash vs. alternatives for agent memory

OptionSetupSearchCost/moMCP-native?
Stash (free tier)Paste URLFTS5 full-text£0
Stash ProPaste URLFTS5 full-text£8
PineconeSignup + API + embeddingsVector similarity$25+Custom code
Supabase + pgvectorDB setup + schema + embeddingsVector + queries$25+Custom code
a structured store local fileCode + schemaFTS5 (if configured)$0No
Mem0Signup + SDK integrationSemantic$9+No

Stash is not the most powerful option. If you need sub-millisecond vector search over 10M embeddings, use Pinecone. But for the common case — agent memory that needs to survive runs and be searchable by text — Stash is the fastest path from zero to working.

Token cost: why MCP memory is cheaper than prompt stuffing

The naive approach to agent memory is "stuff everything into the system prompt." This is expensive and gets worse as memory grows:

# Prompt-stuffing approach (expensive):
system_prompt = f"""
You are an agent. Here is everything you know:
{all_memory_as_text}  # grows without bound, costs tokens every call
"""

# MCP pull approach (cheap):
# Agent starts with minimal context
# Calls stash_search() only when it needs specific memory
# Pays tokens only for what's actually retrieved

At our benchmark: ~192 tokens for a 500-record FTS5 search result vs. ~4,800+ for the same data stuffed in a prompt. When your agent runs hundreds of times, this matters.

Setting it up (2 minutes)

If you're building a Claude agent via the Agents SDK or the Claude API with tool use:

  1. Sign up at stashlite.com → get your connector URL
  2. Add the connector URL as an MCP server in your agent's tool config
  3. The agent now has stash_add, stash_search, context(), and usage() as tools
  4. Instruct the agent to save what matters, search before acting

No schema migration. No embedding pipeline. No infra to run.

If you're using Claude.ai (not the API): The same connector works. Settings → Integrations → Add custom integration → paste the URL. Your agent-style workflows in Claude.ai get the same memory store.

The limits you should know

Stash is intentionally simple. What it doesn't do:

For production multi-agent systems at scale, evaluate accordingly. For individual agents, side projects, and prototypes: Stash is usually enough, and it's the fastest thing to wire up.

Give your agent memory in 2 minutes

Free tier. No credit card. Connector URL on signup.

Get your connector URL →