Engram · by Unbidden AI
Persistent memory for AI-assisted development. Works with any OpenAI-compatible model via MCP or REST API.
# Query your project memory
engram query my-project "what auth approach did we decide on?"
→ Auth: JWT, stateless, multi-region. Decided 2026-03-01.
→ Database: PostgreSQL via Supabase. Rejected SQLite.
The problem
Engram was built specifically for each of these. Not workarounds — architectural fixes.
Your 4K context fills up at turn 15. Every session starts with re-explaining what you built two days ago. Engram extends it to unlimited — relevant facts only, no noise.
You're paying for every token in history. Stuffing prior context into every call is expensive. Engram cuts that by 60–80% by sending only what matters for the current task.
Week 6 AI contradicts week 1 decisions. You've said "we're using PostgreSQL" six times. Engram makes that decision permanent — retrieves in week 12 the same as week 1.
How it works
A graph-based knowledge store with semantic retrieval and temporal decay scoring.
One config block in Claude Code, Cursor, Windsurf, or Continue.dev. The MCP server runs as a background process — zero impact on your editor.
At session end, engram_extract processes your conversation. Decisions, constraints, implementation details, and open questions are extracted and merged into a persistent knowledge graph.
engram_query returns the facts most relevant to your current task. Superseded facts retire automatically — old decisions don't clutter the context.
Integrations
If it speaks MCP or HTTP, it works with Engram.
Claude Code — ~/.claude/settings.json
{
"mcpServers": {
"engram": {
"command": "engram",
"args": ["mcp-serve"],
"env": {
"ENGRAM_PROJECT": "my-project"
}
}
}
}OpenClaw — openclaw.json
{
"mcpServers": {
"engram": {
"command": "engram",
"args": ["mcp-serve"],
"env": {
"ENGRAM_PROJECT": "my-project"
}
}
}
}See the full integration guide for Cursor, Windsurf, Continue.dev, Cline, and Zed — with exact config file paths for each tool. Using Hermes? See the Hermes MemoryProvider guide →
Get started
Local mode. No account required. No credit card.
FAQ
RAG retrieves from a document corpus — you put documents in, it fetches chunks when asked. Engram builds memory from conversations — it watches what you're working on, extracts decisions and facts as you go, and surfaces them in future sessions automatically. It's session memory, not document search.
Built-in memory tools summarize old context or drop it when the context window fills. Architectural decisions from early in a project eventually disappear. Engram extracts structured facts — it doesn't summarize or discard. A decision from week 1 retrieves just as accurately in week 12 as it did on day 2.
Local mode: Nothing leaves your machine. The MCP server runs locally, extraction calls your configured LLM endpoint, and the memory store is a local file. Hosted mode: Session text is sent to the Engram API for extraction. Your data is always exportable via engram export.
Yes. Engram works with any OpenAI-compatible endpoint. Gemini Flash is the recommended extraction model for cost and accuracy — local extraction via Ollama is supported and documented.
When a new decision supersedes an old one, Engram marks the old fact as retired. It stops appearing in retrieval. You see the current state of the project — not a history of every decision including ones you've reversed.
95% recall on our benchmark across 23 questions spanning three synthetic projects (API design, auth system, data pipeline). Accuracy is higher with the --verify flag. Full methodology in the repo under benchmarks/.
Gemini Flash costs ~$0.15/1M tokens. A typical 50-turn session generates ~5,000 tokens of transcript. Extraction cost per session: ~$0.001. For a 5-person team at 440 sessions/month: under $0.50/month in LLM extraction costs.