Stash / Blog · June 2026 · 7 min read

Is Claude Too Expensive? What Actually Helps (API vs Claude.ai)

"Claude is getting expensive" means two completely different things depending on how you use it — and the fixes are different. If you're using the API directly, you're paying per token and there are real engineering levers to pull. If you're on Claude.ai (the website, $20/month), your "cost" is rate limits, not a bill — and the levers are narrower.

Here's the honest breakdown.

Part 1 — Claude API users: paying per token

API If you're calling Claude via the API — building tools, running workflows, or using it as a backend — your bill grows with input tokens, output tokens, and model tier. The main levers:

Prompt caching (saves the most, requires engineering)

Claude's prompt caching feature lets you mark parts of your prompt as cacheable. Cached tokens cost roughly 10× less than uncached on re-use. If you have a large system prompt that stays the same across many calls, caching it can cut your bill substantially.

This requires API access and code changes — you're marking specific blocks with cache headers. It's worth it for long, stable system prompts, but not a one-afternoon job.

Model routing (pick the right model per task)

Claude Haiku (the small model) costs about 20× less per token than Claude Opus. If you're routing every request to Sonnet or Opus, you're probably overpaying for simple tasks like classification, reformatting, or short lookups. The engineering cost: a routing layer that picks the model based on the task type.

Batch API (50% cheaper, tolerates latency)

If your use case can wait a few hours, the Batch API processes requests at half the standard price. Good for overnight analytics runs or bulk document processing.

Context trimming (stop paying for stale history)

The most common hidden cost: conversation history that grows unbounded. By message 30, you're paying to re-read 20 turns that are no longer relevant. Summarising or truncating old turns programmatically can cut input tokens significantly — but again, requires engineering.

External memory instead of pasting documents

If you're pasting the same documents, databases, or reference material into every conversation, you're paying for those tokens on every call. External memory tools (MCP servers, retrieval systems) let Claude fetch only what's needed for the specific query. This is where Stash fits: instead of loading a 200-row database, Claude calls search("recent contacts") and gets 3 rows back.

Token comparison (preliminary, n=1): loading a 500-row Stash collection costs ~192 tokens. Loading the same data via Notion API costs ~4,800 tokens. That's a ~25× difference — which compounds fast if you're doing it repeatedly.

Part 2 — Claude.ai subscribers: paying per month, hitting rate limits

Claude.ai If you're using Claude at claude.ai — the website, with a $20/month Pro subscription — you don't have a token bill. You pay a flat rate and hit conversation rate limits instead.

Can Claude.ai users use prompt caching?

No. Prompt caching is an API feature. It's implemented at the infrastructure level when you call the API with specific cache-control headers. Claude.ai doesn't expose this to subscribers — you'd need API access and code to take advantage of it.

Honest answer: If you're a Claude.ai subscriber (not an API user), you cannot directly control prompt caching. It's not something you can toggle in the interface.

So what CAN Claude.ai subscribers do?

Your "cost" is rate limits: Claude.ai Pro gives you a higher message limit than free, but it's still finite. The way to get more from your subscription is to make each conversation go further — do more per message, not more messages.

Practically, this means:

Don't paste large documents into every conversation. If you're pasting a 5,000-word doc every time to answer a question, each message is burning capacity fast. Store it externally, retrieve what's needed.
Don't re-paste standing context. If you start every conversation with "I'm a developer at X company working on Y project…" — that's wasted message budget. MCP connectors can inject it automatically.
Use Projects. Claude's built-in Projects feature lets you store documents and a custom system prompt so you don't paste them every session. This doesn't reduce tokens per API call, but for Claude.ai users it means less manual overhead.

The one fix that works for both groups

Whether you pay per token (API) or per month (Claude.ai), the common thread is: stop loading large datasets into the context when you only need a slice of them.

That's what Stash does. You store your records — contacts, tasks, notes, Notion exports — in a fast, indexed store. Claude fetches what it needs with a tool call. For API users: fewer tokens per query. For Claude.ai users: less manual pasting, more done per message.

The honest summary

Technique	Requires code?	Works for API?	Works for Claude.ai?
Prompt caching	Yes	✓	✗
Model routing	Yes	✓	✗
Batch API	Yes	✓	✗
Context trimming	Yes	✓	Partially
Claude.ai Projects	No	N/A	✓ (reduces manual work)
External memory (Stash)	No	✓	✓

The only technique in the list that works for both groups without code: external memory via an MCP connector. Everything else requires either engineering work (API) or doesn't apply (Claude.ai).

If you're an API developer, see our full breakdown of all 8 API cost levers. If you're a Claude.ai subscriber who keeps pasting the same context, Stash is a 30-second fix.

Stop pasting. Start fetching.

Add Stash to Claude in 30 seconds — no engineering required. Your records stay in a fast store; Claude retrieves exactly what it needs.

Add Stash to Claude →