Claude for Data Scientists: Keep Project Context Between Sessions

Published June 2026 · 6 min read

Data scientists use Claude heavily — for code review, explaining what a model is doing, debugging pandas operations, thinking through feature engineering decisions. But every new conversation requires you to re-explain your data schema, re-describe your pipeline, re-justify architectural choices you already reasoned through last week.

The problem isn't Claude's reasoning. It's that Claude's context is ephemeral. Your project knowledge isn't.

What data scientists lose between sessions

The re-briefing tax in technical AI work is especially high because:

Schema details matter — user_id vs uid, date columns as strings vs timestamps, which nullable columns exist and why
Decisions have history — "we tried XGBoost, it overfit, we moved to LightGBM with these hyperparameters, that's the current baseline"
Experiments accumulate — you've run 40 experiments; Claude can't reference them without being told
Data quality issues — "column X has ~15% missing, we impute with median but only for segment A"
Team conventions — naming conventions, review checklist, which tools you use

You either paste all of this into every session, or Claude gives you generic answers that don't fit your actual setup.

The project-context pattern

Stash stores your project context as lightweight text records, searchable via Claude in milliseconds. Typical structure for a DS project:

Collection	What goes in it
`schema`	Column names, types, gotchas, join keys
`decisions`	Architectural choices and the reasoning behind them
`experiments`	What you tried, what happened, what it changed
`issues`	Known data quality problems, outstanding investigations
`context`	Team conventions, tooling, preferences (the `context()` standing record)

What this looks like in practice

Debugging session with project context loaded

You: "context(expand=['schema', 'decisions'])"

Claude: [loads your standing context + schema notes + key decisions for this project]

You: "I'm getting a KeyError on 'user_segment' when joining the events and users tables. Here's the stack trace: [paste]"

Claude: "Your schema notes say user_segment is only present in the users_v2 table, not users_v1. Your decisions log also shows you decided to drop v1 users in April but this join looks like it's pulling from the combined source. Are you filtering before or after the join?"

That answer required knowing your schema and your past decisions. Without Stash, Claude would guess. With Stash, it reasons from your actual project state.

Experiment tracking (lightweight)

This isn't a replacement for MLflow or W&B — those tools track metrics, artefacts, and run parameters with precision. Stash tracks the narrative behind the experiments: why you ran them, what you concluded, what they changed about your approach.

Logging a new experiment result

You: "add to experiments: 'Exp-047 — tried removing low-variance features before PCA. AUC dropped 0.02. Conclusion: those features carry signal even with low marginal variance. Reverting. Baseline stays at Exp-043.'"

Claude: [adds to Stash] "Logged. That brings your experiment log to 47 entries. Do you want me to search the log for other cases where feature removal hurt performance?"

Next week, when you're back at this problem, you ask Claude to search your experiments. It returns the relevant entries — no Confluence page hunting, no scrolling Slack history.

Schema documentation that stays current

Most data scientists have some kind of schema doc that gets stale. With Stash, you update it in conversation:

Quick schema update mid-session

You: "update my schema notes — we added a cohort_id column to the events table today. UUID, not nullable, backfilled to 'legacy' for rows before 2026-01."

Claude: [updates the record] "Schema updated. Worth noting your join logic on the events-users join will need to handle 'legacy' cohort_id for historical data — want me to draft the filter?"

The schema note is now current and available in every future session. No doc to update separately.

Token efficiency note

Data scientists using Claude via the API (not just claude.ai) care about token costs. Stash is designed to be token-light: search returns only the matching records, not your entire store. A typical context() call for a project loads ~1,000-2,000 tokens of standing context, not megabytes of project history. This is intentional — the goal is to make Claude aware of your project without blowing your context window.

Free tier fit

A mid-size project with 50 schema notes + 100 experiment records + 50 decisions = ~200 records. Well within the 2,500-record free tier. Multiple projects with separate collections stay organised and searchable without overlap.

Getting started

Add the MCP connector at stashlite.com
Sign in with Google — provisioned immediately
In Claude, create your first collection: add to schema: [your first table name and key columns]
Next session: context(expand=['schema']) to load project context before diving in

Add Stash to Claude

Connector URL (Claude Settings → Integrations → Add MCP):

https://app.stashlite.com/mcp

Free tier. Google sign-in. No card required.