Engram · by Unbidden AI

Your AI forgets everything
between sessions.
Engram doesn't.

Persistent memory for AI-assisted development. Works with any OpenAI-compatible model via MCP or REST API.

Get started — free View docs

# Query your project memory

engram query my-project "what auth approach did we decide on?"

→ Auth: JWT, stateless, multi-region. Decided 2026-03-01.

→ Database: PostgreSQL via Supabase. Rejected SQLite.

The problem

Three ways the context window kills AI-assisted work

Engram was built specifically for each of these. Not workarounds — architectural fixes.

Local model users

Your 4K context fills up at turn 15. Every session starts with re-explaining what you built two days ago. Engram extends it to unlimited — relevant facts only, no noise.

API teams

You're paying for every token in history. Stuffing prior context into every call is expensive. Engram cuts that by 60–80% by sending only what matters for the current task.

Long-running projects

Week 6 AI contradicts week 1 decisions. You've said "we're using PostgreSQL" six times. Engram makes that decision permanent — retrieves in week 12 the same as week 1.

How it works

Extract. Store. Retrieve.

A graph-based knowledge store with semantic retrieval and temporal decay scoring.

Connect via MCP or REST API

One config block in Claude Code, Cursor, Windsurf, or Continue.dev. The MCP server runs as a background process — zero impact on your editor.

Engram extracts facts from every session

At session end, engram_extract processes your conversation. Decisions, constraints, implementation details, and open questions are extracted and merged into a persistent knowledge graph.

Next session, retrieve exactly what's relevant

engram_query returns the facts most relevant to your current task. Superseded facts retire automatically — old decisions don't clutter the context.

Integrations

Works with your existing stack

If it speaks MCP or HTTP, it works with Engram.

Claude Code Cursor Windsurf Continue.dev OpenClaw Codex Hermes Cline Zed Any MCP client REST API

Claude Code — ~/.claude/settings.json

{
  "mcpServers": {
    "engram": {
      "command": "engram",
      "args": ["mcp-serve"],
      "env": {
        "ENGRAM_PROJECT": "my-project"
      }
    }
  }
}

Claude Code setup guide →

OpenClaw — openclaw.json

{
  "mcpServers": {
    "engram": {
      "command": "engram",
      "args": ["mcp-serve"],
      "env": {
        "ENGRAM_PROJECT": "my-project"
      }
    }
  }
}

OpenClaw setup + MEMORY.md migration →

See the full integration guide for Cursor, Windsurf, Continue.dev, Cline, and Zed — with exact config file paths for each tool. Using Hermes? See the Hermes MemoryProvider guide →

FAQ

Common questions

How is this different from RAG?

RAG retrieves from a document corpus — you put documents in, it fetches chunks when asked. Engram builds memory from conversations — it watches what you're working on, extracts decisions and facts as you go, and surfaces them in future sessions automatically. It's session memory, not document search.

How is this different from built-in memory in Claude Code or ChatGPT?

Built-in memory tools summarize old context or drop it when the context window fills. Architectural decisions from early in a project eventually disappear. Engram extracts structured facts — it doesn't summarize or discard. A decision from week 1 retrieves just as accurately in week 12 as it did on day 2.

Does my conversation data leave my machine?

Local mode: Nothing leaves your machine. The MCP server runs locally, extraction calls your configured LLM endpoint, and the memory store is a local file. Hosted mode: Session text is sent to the Engram API for extraction. Your data is always exportable via engram export.

Does it work with local models (Ollama, LM Studio)?

Yes. Engram works with any OpenAI-compatible endpoint. Gemini Flash is the recommended extraction model for cost and accuracy — local extraction via Ollama is supported and documented.

What happens to old facts when I change direction?

When a new decision supersedes an old one, Engram marks the old fact as retired. It stops appearing in retrieval. You see the current state of the project — not a history of every decision including ones you've reversed.

How accurate is the extraction?

95% recall on our benchmark across 23 questions spanning three synthetic projects (API design, auth system, data pipeline). Accuracy is higher with the --verify flag. Full methodology in the repo under benchmarks/.

What does it cost to run locally?

Gemini Flash costs ~$0.15/1M tokens. A typical 50-turn session generates ~5,000 tokens of transcript. Extraction cost per session: ~$0.001. For a 5-person team at 440 sessions/month: under $0.50/month in LLM extraction costs.

Your AI forgets everythingbetween sessions.Engram doesn't.