"Claude is getting expensive" means two completely different things depending on how you use it — and the fixes are different. If you're using the API directly, you're paying per token and there are real engineering levers to pull. If you're on Claude.ai (the website, $20/month), your "cost" is rate limits, not a bill — and the levers are narrower.
Here's the honest breakdown.
API If you're calling Claude via the API — building tools, running workflows, or using it as a backend — your bill grows with input tokens, output tokens, and model tier. The main levers:
Claude's prompt caching feature lets you mark parts of your prompt as cacheable. Cached tokens cost roughly 10× less than uncached on re-use. If you have a large system prompt that stays the same across many calls, caching it can cut your bill substantially.
This requires API access and code changes — you're marking specific blocks with cache headers. It's worth it for long, stable system prompts, but not a one-afternoon job.
Claude Haiku (the small model) costs about 20× less per token than Claude Opus. If you're routing every request to Sonnet or Opus, you're probably overpaying for simple tasks like classification, reformatting, or short lookups. The engineering cost: a routing layer that picks the model based on the task type.
If your use case can wait a few hours, the Batch API processes requests at half the standard price. Good for overnight analytics runs or bulk document processing.
The most common hidden cost: conversation history that grows unbounded. By message 30, you're paying to re-read 20 turns that are no longer relevant. Summarising or truncating old turns programmatically can cut input tokens significantly — but again, requires engineering.
If you're pasting the same documents, databases, or reference material into every conversation, you're paying for those tokens on every call. External memory tools (MCP servers, retrieval systems) let Claude fetch only what's needed for the specific query. This is where Stash fits: instead of loading a 200-row database, Claude calls search("recent contacts") and gets 3 rows back.
Token comparison (preliminary, n=1): loading a 500-row Stash collection costs ~192 tokens. Loading the same data via Notion API costs ~4,800 tokens. That's a ~25× difference — which compounds fast if you're doing it repeatedly.
Claude.ai If you're using Claude at claude.ai — the website, with a $20/month Pro subscription — you don't have a token bill. You pay a flat rate and hit conversation rate limits instead.
No. Prompt caching is an API feature. It's implemented at the infrastructure level when you call the API with specific cache-control headers. Claude.ai doesn't expose this to subscribers — you'd need API access and code to take advantage of it.
Your "cost" is rate limits: Claude.ai Pro gives you a higher message limit than free, but it's still finite. The way to get more from your subscription is to make each conversation go further — do more per message, not more messages.
Practically, this means:
Whether you pay per token (API) or per month (Claude.ai), the common thread is: stop loading large datasets into the context when you only need a slice of them.
That's what Stash does. You store your records — contacts, tasks, notes, Notion exports — in a fast, indexed store. Claude fetches what it needs with a tool call. For API users: fewer tokens per query. For Claude.ai users: less manual pasting, more done per message.
| Technique | Requires code? | Works for API? | Works for Claude.ai? |
|---|---|---|---|
| Prompt caching | Yes | ✓ | ✗ |
| Model routing | Yes | ✓ | ✗ |
| Batch API | Yes | ✓ | ✗ |
| Context trimming | Yes | ✓ | Partially |
| Claude.ai Projects | No | N/A | ✓ (reduces manual work) |
| External memory (Stash) | No | ✓ | ✓ |
The only technique in the list that works for both groups without code: external memory via an MCP connector. Everything else requires either engineering work (API) or doesn't apply (Claude.ai).
If you're an API developer, see our full breakdown of all 8 API cost levers. If you're a Claude.ai subscriber who keeps pasting the same context, Stash is a 30-second fix.
Add Stash to Claude in 30 seconds — no engineering required. Your records stay in a fast store; Claude retrieves exactly what it needs.
Add Stash to Claude →Related: 8 Ways to Reduce Your Claude API Costs · Your Notion Database Is Making Claude Expensive