You opened Claude to finish something. Then: "You've reached your usage limit for this period." Frustrating — especially if you're paying for Pro.
This post explains what Claude's limits actually count, why some workflows drain them much faster than others, and the most effective ways to reduce your consumption without changing plans.
Claude doesn't limit you by messages or requests. It limits by tokens — the chunks of text (roughly ¾ of a word each) that the model processes.
What counts toward your limit:
Both count. A short question that carries a 30-page document attachment costs much more than a long message typed from scratch.
A normal back-and-forth conversation with Claude might use 2,000–5,000 tokens per exchange. That's well within any plan's capacity for a day of normal use.
But several common patterns multiply that number fast:
If you regularly attach a spec sheet, a Notion export, a spreadsheet, or a codebase, that content gets fed into the model on every turn — even when the question is small. A 50-page document attached to ten messages across a conversation might cost 200,000 tokens, most of which is the document being re-read.
Claude reads the full conversation history each turn. A conversation that started three hours ago and has dozens of back-and-forths carries that entire history as input. By turn 30, you might be paying for the same early context 30 times over.
Custom instructions are injected at the start of every conversation. Anything you've put there — team protocols, contact lists, project notes, your entire personal background — adds to every single exchange.
If you use an MCP connector that returns full document contents, large JSON objects, or long search results, those get fed into Claude's context too. A poorly designed search tool that returns full articles instead of summaries can easily add 10,000 tokens per query.
Anthropic doesn't publish exact numbers, and they adjust the limits based on server load. What's published is that limits reset periodically (typically every 5 or 8 hours), and you get more total capacity than the free plan.
In practice, hitting the limit quickly almost always points to one of the patterns above — not to a plan that's too small. Before upgrading, it's worth checking whether the work itself is unusually token-heavy.
A 500-row Notion database, attached as a file or pasted as context, adds roughly 50,000–100,000 tokens to your conversation — equivalent to 20–40 normal exchanges. If you're doing that every time you ask Claude a question about your data, you'll hit limits fast regardless of plan.
Long conversation histories are token sinks. When you switch context — new topic, new task — start a new conversation rather than continuing in an existing one. The history of the old task isn't helping the new one; it's just adding tokens.
Custom instructions load on every conversation. They're the right place for how-you-work instructions ("prefer bullet points", "always ask for clarification before writing"). They're a poor place for reference data that changes or that you only need sometimes.
A 2,000-token contact list in custom instructions means every conversation you ever start pays for that contact list — even when you're writing code.
If you have a large reference document, don't attach it to every message. Attach it once at the start of a conversation, ask your questions, then close it. Or — better — keep the data in a place where Claude can query just the relevant piece instead of reading the whole thing.
Claude's default output style can be verbose. Adding "be concise" or "3 sentences max" to your prompts reduces output tokens and speeds up responses. For structured data, ask for JSON or a table rather than prose.
This is the highest-leverage change for people who regularly work with large datasets, contact lists, project notes, or reference databases.
Instead of attaching or pasting the full dataset, connect a tool that can search it and return only the relevant rows. The difference in tokens is large:
| Approach | Tokens per query | What Claude reads |
|---|---|---|
| Paste full 500-row export | ~80,000 | All 500 rows every time |
| Attach the CSV | ~50,000–80,000 | All 500 rows (reformatted) |
| Searchable MCP store | ~200–400 | Only the 3–5 matching rows |
The search approach costs Claude about 200–400 tokens to retrieve the same answer — roughly 200× fewer than pasting the full dataset. Those savings compound across every conversation where you touch that data.
Here's the pattern that most often puts people over their limits, and the fix:
Before: Paste your 300-contact CRM into Claude's context, ask "what's Alice's current status?" — Claude reads 300 contacts, answers one question. Repeat for each query. 30,000+ tokens each time.
After: Store the contacts in a connected tool. Claude calls search("Alice"), gets back 1 matching record. Answer delivered. 300 tokens total.
Same result. Roughly 100× fewer tokens. Limit pressure drops dramatically.
If you've cleaned up your token-heavy patterns and still routinely hit limits, a few options:
Claude's limits are real, but they rarely require a plan upgrade. Almost always, hitting them quickly is a symptom of a token-heavy workflow, not a plan that's too small.
The most effective single change for most users is getting large reference data out of Claude's context — out of custom instructions, out of attachments, out of pasted text — and into a form Claude can query on demand. The difference per query isn't marginal; it's usually 50–200×.
Stash is built for this pattern. It's a remote record store Claude connects to over MCP — you add the connector URL once, and Claude can search your records with search("term") or pull standing context with context() instead of reading everything every time.
Paste in your data via Claude, then ask questions against it. Token consumption drops to the search-response size, not the full dataset size. Add Stash to Claude →