← All posts

Claude Rate Limits Explained: Why You Hit Them and How to Stay Under

June 2026  ·  Stash

You opened Claude to finish something. Then: "You've reached your usage limit for this period." Frustrating — especially if you're paying for Pro.

This post explains what Claude's limits actually count, why some workflows drain them much faster than others, and the most effective ways to reduce your consumption without changing plans.

What Claude's limits actually measure

Claude doesn't limit you by messages or requests. It limits by tokens — the chunks of text (roughly ¾ of a word each) that the model processes.

What counts toward your limit:

Both count. A short question that carries a 30-page document attachment costs much more than a long message typed from scratch.

Why heavy tasks hit limits faster

A normal back-and-forth conversation with Claude might use 2,000–5,000 tokens per exchange. That's well within any plan's capacity for a day of normal use.

But several common patterns multiply that number fast:

1. Large files or documents attached to every message

If you regularly attach a spec sheet, a Notion export, a spreadsheet, or a codebase, that content gets fed into the model on every turn — even when the question is small. A 50-page document attached to ten messages across a conversation might cost 200,000 tokens, most of which is the document being re-read.

2. Long conversation histories

Claude reads the full conversation history each turn. A conversation that started three hours ago and has dozens of back-and-forths carries that entire history as input. By turn 30, you might be paying for the same early context 30 times over.

3. Custom instructions with large reference data

Custom instructions are injected at the start of every conversation. Anything you've put there — team protocols, contact lists, project notes, your entire personal background — adds to every single exchange.

4. Tools that return verbose results

If you use an MCP connector that returns full document contents, large JSON objects, or long search results, those get fed into Claude's context too. A poorly designed search tool that returns full articles instead of summaries can easily add 10,000 tokens per query.

How Claude Pro limits work in practice

Anthropic doesn't publish exact numbers, and they adjust the limits based on server load. What's published is that limits reset periodically (typically every 5 or 8 hours), and you get more total capacity than the free plan.

In practice, hitting the limit quickly almost always points to one of the patterns above — not to a plan that's too small. Before upgrading, it's worth checking whether the work itself is unusually token-heavy.

A 500-row Notion database, attached as a file or pasted as context, adds roughly 50,000–100,000 tokens to your conversation — equivalent to 20–40 normal exchanges. If you're doing that every time you ask Claude a question about your data, you'll hit limits fast regardless of plan.

Five ways to reduce token consumption

1. Start fresh conversations for new tasks

Long conversation histories are token sinks. When you switch context — new topic, new task — start a new conversation rather than continuing in an existing one. The history of the old task isn't helping the new one; it's just adding tokens.

2. Move reference data out of custom instructions

Custom instructions load on every conversation. They're the right place for how-you-work instructions ("prefer bullet points", "always ask for clarification before writing"). They're a poor place for reference data that changes or that you only need sometimes.

A 2,000-token contact list in custom instructions means every conversation you ever start pays for that contact list — even when you're writing code.

3. Attach files only when needed

If you have a large reference document, don't attach it to every message. Attach it once at the start of a conversation, ask your questions, then close it. Or — better — keep the data in a place where Claude can query just the relevant piece instead of reading the whole thing.

4. Ask for concise outputs

Claude's default output style can be verbose. Adding "be concise" or "3 sentences max" to your prompts reduces output tokens and speeds up responses. For structured data, ask for JSON or a table rather than prose.

5. Use searchable external storage instead of pasting data

This is the highest-leverage change for people who regularly work with large datasets, contact lists, project notes, or reference databases.

Instead of attaching or pasting the full dataset, connect a tool that can search it and return only the relevant rows. The difference in tokens is large:

ApproachTokens per queryWhat Claude reads
Paste full 500-row export~80,000All 500 rows every time
Attach the CSV~50,000–80,000All 500 rows (reformatted)
Searchable MCP store~200–400Only the 3–5 matching rows

The search approach costs Claude about 200–400 tokens to retrieve the same answer — roughly 200× fewer than pasting the full dataset. Those savings compound across every conversation where you touch that data.

What this looks like in practice

Here's the pattern that most often puts people over their limits, and the fix:

Before: Paste your 300-contact CRM into Claude's context, ask "what's Alice's current status?" — Claude reads 300 contacts, answers one question. Repeat for each query. 30,000+ tokens each time.

After: Store the contacts in a connected tool. Claude calls search("Alice"), gets back 1 matching record. Answer delivered. 300 tokens total.

Same result. Roughly 100× fewer tokens. Limit pressure drops dramatically.

If you still hit the limit

If you've cleaned up your token-heavy patterns and still routinely hit limits, a few options:

The honest answer about limits

Claude's limits are real, but they rarely require a plan upgrade. Almost always, hitting them quickly is a symptom of a token-heavy workflow, not a plan that's too small.

The most effective single change for most users is getting large reference data out of Claude's context — out of custom instructions, out of attachments, out of pasted text — and into a form Claude can query on demand. The difference per query isn't marginal; it's usually 50–200×.

Stash is built for this pattern. It's a remote record store Claude connects to over MCP — you add the connector URL once, and Claude can search your records with search("term") or pull standing context with context() instead of reading everything every time.

Paste in your data via Claude, then ask questions against it. Token consumption drops to the search-response size, not the full dataset size. Add Stash to Claude →

Related reading