← Blog

Getting started with Engram: persistent memory for AI-assisted development

Your AI forgets everything between sessions. Here's how Engram fixes that — and why compound context changes how you work with AI tools.

Every AI session starts from zero. You open Claude Code, Cursor, or Windsurf, and before you can get to the actual problem, you spend the first few turns re-explaining: the architecture you're using, the decision you made last week, the constraint that's been baked in since day one. It's the cognitive overhead nobody talks about.

Engram solves this. It's a persistent memory layer that sits between your sessions — extracting structured facts from conversations and storing them in a knowledge graph you can query at any time. Your AI editor pulls that context automatically, or you inject it explicitly. Either way, you stop explaining and start building.

How it works

Engram runs as a local CLI tool. When you finish a session — or even mid-session — you point it at a transcript or markdown file:

engram extract my-project transcript.txt

Under the hood, a fast LLM reads the content and pulls out structured nodes:

  • decisions — "we're using Postgres instead of SQLite"
  • constraints — "must ship before the July release"
  • trade_offs — "chose JWT over sessions for stateless scaling; accepted the refresh token complexity"
  • implementations — "auth middleware lives in middleware/auth.ts"
  • lessons_learned — "the webhook retry logic broke under concurrent requests; fixed with a queue"
  • open_questions — "still need to decide on the caching strategy"
  • references — pointers to files, docs, external resources

These nodes live in a local knowledge graph. Duplicate or superseded facts are merged automatically — if you make a different decision later, the graph reflects the current state, not the history.

Getting set up in 5 minutes

Install Engram with pip:

pip install engram
engram --version

Create a project for your codebase:

engram init my-project

Configure the extraction model. Gemini Flash is fast and cheap (~$0.001 per session). Add your key to ~/.engram/config.yaml:

model: gemini/gemini-2.5-flash
api_key: YOUR_GEMINI_API_KEY

Extract your first session:

engram extract my-project session.txt

Query it:

engram query my-project "what decisions have we made about auth?"

Connecting to your editor

The fastest path is the MCP server. Start it:

engram mcp-serve

Then add Engram to your Claude Code config at ~/.claude/settings.json:

{
  "mcpServers": {
    "engram": {
      "command": "engram",
      "args": ["mcp-serve"]
    }
  }
}

Now engram_query and engram_extract are available as tools in every Claude Code session. The AI can pull context when it needs it, or you can invoke the tools explicitly.

Cursor and Windsurf use the same MCP config format — see the MCP Server docs for editor-specific setup.

Why it compounds

The real value isn't any single session. It's what happens after 20 sessions. By that point, Engram knows your architecture, your constraints, your past mistakes, and your open questions. New sessions don't start from zero — they start from everything you've already figured out.

The AI stops asking what your stack is. It stops suggesting patterns you've already ruled out. It starts contributing at a higher level because it has the context to operate at a higher level.

That's the compounding effect. The longer you use Engram, the more leverage you get from every session.

Next steps