Memory Palace¶
Design Principle: Raw knowledge retrieval is noisy. A corpus that spans code reviews, legal questions, and medical consultations must not treat all stored facts as interchangeable. The Memory Palace gives each domain its own namespace, lets the planner attach retrieval filters to its plan, and provides a persistent ingest path so that even session-level knowledge — from tools like Claude Code — survives context resets.
Overview¶
Three related features work together to make memory retrieval precise and domain-aware:
| Feature | What it does |
|---|---|
| Metadata-Filtered Search | The planner optionally extracts domain filters from the query; graph_rag_node applies them as a ChromaDB where clause for scoped retrieval |
| Isolated Expert Memory | Every fact stored in ChromaDB or Neo4j is tagged with the source expert category (expert_domain); reads can be filtered to stay within a domain |
| Auto-Save Hooks | Claude Code hook scripts and a new /v1/memory/ingest endpoint allow external sessions to persist key decisions into the knowledge base before context is lost |
Feature 1 — Metadata-Filtered Semantic Search¶
Motivation¶
Pure vector similarity returns the most similar documents regardless of domain. A query
about "Python list comprehensions" might surface cached responses from a data_analyst
session that happen to mention Python, rather than the directly relevant code_reviewer
answer. With millions of cached responses across all expert domains this noise grows
significantly.
Metadata filtering restricts the vector search to a specific namespace before computing cosine similarity — the equivalent of searching only the relevant "room" of the memory palace.
How it Works¶
sequenceDiagram
participant Client
participant Planner as Planner LLM
participant State as AgentState
participant GR as graph_rag_node
participant Chroma as ChromaDB moe_fact_cache
Client->>Planner: Request (e.g. "review this Python code")
Note over Planner: Optionally adds metadata_filters<br/>to first task object in plan JSON
Planner-->>State: plan=[{task, category, metadata_filters?}]<br/>metadata_filters={expert_domain:"code_reviewer"}
State->>GR: graph_rag_node reads metadata_filters from state
GR->>Chroma: query(query_texts=[input], n_results=3,<br/>where={"expert_domain":{"$eq":"code_reviewer"}})
Chroma-->>GR: 3 domain-filtered documents
GR-->>State: graph_context += "[Domain-Filtered Memory (expert_domain=code_reviewer)]\n..."
Planner Integration¶
The planner prompt contains an optional instruction:
OPTIONAL: Add a "metadata_filters" key to the FIRST task object when the domain
is unambiguous, to scope downstream memory retrieval. Use string values only.
Omit when unsure.
Example: {"task": "...", "category": "code_reviewer",
"metadata_filters": {"expert_domain": "code_reviewer", "project": "frontend"}}
The planner includes metadata_filters only when it is confident — e.g., a clearly
domain-specific query. The parser in planner_node extracts the dict from the first task
before calling _sanitize_plan(), storing it in AgentState.metadata_filters.
ChromaDB where Clause¶
graph_rag_node builds the where clause from the state field:
_where = {k: {"$eq": v} for k, v in _meta_filters.items() if isinstance(v, str) and v}
if len(_where) > 1:
_where = {"$and": [{k: v} for k, v in _where.items()]}
_chroma_res = await asyncio.to_thread(
cache_collection.query,
query_texts=[state["input"]],
n_results=3,
where=_where,
)
The filtered results are appended to graph_context under a [Domain-Filtered Memory]
label. Any ChromaDB error silently degrades to unfiltered Neo4j-only context — the
filtered path is always additive, never a hard dependency.
Supported Filter Keys¶
Any string-valued metadata field stored on ChromaDB documents can be used:
| Key | Example Value | Description |
|---|---|---|
expert_domain |
"code_reviewer" |
Expert category that created the cached response |
project |
"frontend" |
Optional project tag added by the planner |
Feature 2 — Isolated Expert Memory¶
Motivation¶
Without domain isolation, a medical_consult expert's cached knowledge pollutes
the retrieval context of a code_reviewer expert and vice versa. Isolation ensures that:
- ChromaDB only surfaces domain-relevant cached responses when filtered
- Neo4j entities and relations carry their origin domain for graph traversal filtering and provenance inspection
Write Path: expert_domain Tagging¶
Every time merger_node stores a response, it derives _expert_domain from the plan:
# Priority: safety-critical domains first, then first plan category
_expert_domain = next(
(c for c in ("medical_consult", "legal_advisor", "technical_support")
if c in _plan_cats_early),
_plan_cats_early[0] if _plan_cats_early else "general",
)
This value is then written to three places:
1. ChromaDB metadata¶
metadatas=[{
"ts": datetime.now().isoformat(),
"input": state["input"][:200],
"flagged": False,
"expert_domain": _expert_domain, # NEW
}]
2. Kafka moe.ingest payload¶
{
"domain": ingest_domain,
"source_expert": ingest_domain, # NEW — forwarded by Kafka consumer
...
}
3. Neo4j via graph_manager¶
The Kafka consumer extracts source_expert and passes it to both ingest methods as
expert_domain. The graph manager writes it to:
:Entitynodes —ON CREATE SET ... a.expert_domain = $expert_domain- Extracted relations —
ON CREATE SET ... r.expert_domain = $expert_domain :Synthesisnodes —ON CREATE SET ... s.expert_domain = $expert_domain
Updated Neo4j Schema¶
See also Graph-basierte Wissensakkumulation — Neo4j Schema.
:Entity nodes — new property:
| Property | Type | Description |
|---|---|---|
expert_domain |
string | Source expert category (e.g. "medical_consult") |
Relations on :Entity edges — new property:
| Property | Type | Description |
|---|---|---|
expert_domain |
string | Expert category that contributed the triple |
:Synthesis nodes — new property:
| Property | Type | Description |
|---|---|---|
expert_domain |
string | Expert category that generated the synthesis insight |
Querying by Domain¶
ChromaDB — filter to a single expert namespace¶
Neo4j — entities from a specific expert¶
Neo4j — syntheses from a specific expert¶
MATCH (s:Synthesis {expert_domain: "legal_advisor"})
RETURN s.text, s.insight_type, s.confidence
ORDER BY s.created DESC
LIMIT 10
Neo4j — cross-domain contamination check¶
MATCH (e:Entity)
WHERE e.expert_domain IS NOT NULL
RETURN e.expert_domain, count(e) AS entity_count
ORDER BY entity_count DESC
Feature 3 — Claude Code Auto-Save Hooks¶
Motivation¶
Claude Code operates with a sliding context window. When the window fills up, older context is compacted and summarised — key technical decisions, constraints, and reasoning can be lost. The Auto-Save Hooks provide a two-part mechanism to prevent this:
- Hook scripts (
hooks/) capture context at the moment of compaction or session end and POST it to the orchestrator /v1/memory/ingestendpoint accepts the payload and enqueues it onmoe.ingestfor async GraphRAG processing — the same pipeline used for all other knowledge ingest
The result: session-level knowledge (architecture decisions, bug root causes, code constraints) flows into Neo4j and becomes retrievable in future sessions.
Architecture¶
sequenceDiagram
participant CC as Claude Code
participant Hook as Hook Script (bash)
participant API as POST /v1/memory/ingest
participant Kafka as moe.ingest
participant Consumer as Kafka Consumer
participant GM as GraphRAGManager
participant Neo4j
participant Chroma as ChromaDB
CC->>Hook: PreCompact / Stop event (stdin JSON)
Hook->>Hook: Extract summary from stdin
Hook->>API: POST {session_summary, key_decisions, domain}
API-->>Hook: {"status": "queued"}
API->>Kafka: publish to moe.ingest
Kafka->>Consumer: async consume
Consumer->>GM: extract_and_ingest(answer=summary, expert_domain="session")
GM->>Neo4j: MERGE :Entity + relations
GM->>Chroma: (via merger_node on next response)
/v1/memory/ingest Endpoint¶
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
session_summary |
string | yes | Text content to persist (decisions, context, insights) |
key_decisions |
list[string] | no | Bullet-point decisions, appended to summary |
domain |
string | no | Knowledge domain tag (default: "session") |
source_model |
string | no | Hook identifier (default: "claude-code-hook") |
confidence |
float | no | Confidence score 0.0–1.0 (default: 0.8) |
Response:
The endpoint publishes to moe.ingest and returns immediately — persistence is
asynchronous and does not block the caller.
Example:
curl -X POST http://localhost:8002/v1/memory/ingest \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MOE_API_KEY" \
-d '{
"session_summary": "Decision: use Valkey for session caching, not in-memory dicts. Rationale: multi-instance deployment.",
"key_decisions": ["Valkey over in-memory cache", "TTL=3600 for session keys"],
"domain": "technical_support"
}'
Hook Scripts¶
Both scripts live in hooks/ at the project root. They are invoked by Claude Code and
receive context as JSON on stdin.
hooks/mempal_precompact_hook.sh — PreCompact hook¶
Fires before Claude Code compacts the context window. The stdin payload includes a
summary field with the LLM-generated compaction summary — the highest-signal snapshot
available before context is truncated.
# Configuration in ~/.claude/settings.json:
{
"hooks": {
"PreCompact": [{
"matcher": "",
"hooks": [{"type": "command",
"command": "/path/to/moe-infra/hooks/mempal_precompact_hook.sh"}]
}]
}
}
hooks/mempal_save_hook.sh — Stop hook¶
Fires when Claude Code finishes a response. Extracts the last 10 assistant messages from the transcript and POSTs the concatenated text as a session summary.
# Configuration in ~/.claude/settings.json:
{
"hooks": {
"Stop": [{
"matcher": "",
"hooks": [{"type": "command",
"command": "/path/to/moe-infra/hooks/mempal_save_hook.sh"}]
}]
}
}
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
MOE_MEMORY_ENDPOINT |
http://localhost:8002/v1/memory/ingest |
Full URL of the ingest endpoint |
MOE_API_KEY |
"" |
API key for authentication (leave empty if not configured) |
Full Setup¶
See hooks/README.md for step-by-step configuration instructions including shell profile
setup and verification commands.
Interaction Between Features¶
The three features form a coherent stack:
flowchart TD
P[Planner extracts metadata_filters] --> F1["graph_rag_node performs filtered ChromaDB query<br/><i>Feature 1 — scoped retrieval</i>"]
F1 --> F2["merger_node writes response to ChromaDB<br/>+ tags with expert_domain<br/>+ publishes to Kafka with source_expert<br/><i>Feature 2 — domain isolation</i>"]
F2 --> HOOK["Claude Code hook fires (PreCompact / Stop)"]
HOOK --> POST["POST /v1/memory/ingest"]
POST --> KQ["enqueued to moe.ingest Kafka"]
KQ --> GR["GraphRAG processes it with<br/>expert_domain = session<br/><i>Feature 3 — external persistence</i>"]
In a typical multi-session workflow:
- A
code_reviewersession builds up knowledge — stored withexpert_domain=code_reviewer - A later
code_reviewersession asks a similar question — the planner setsmetadata_filters={"expert_domain": "code_reviewer"}— only code review facts surface - Before the context window fills, the PreCompact hook fires — the session summary is
persisted to Neo4j as
expert_domain=session, available for future cross-domain retrieval
Operational Reference¶
Verify expert_domain tagging in ChromaDB¶
# Connect to ChromaDB (port 8001)
import chromadb
client = chromadb.HttpClient(host="localhost", port=8001)
col = client.get_collection("moe_fact_cache")
results = col.get(where={"expert_domain": {"$eq": "code_reviewer"}}, limit=5)
print(results["documents"])
Query session memory in Neo4j¶
-- All session-sourced entities
MATCH (e:Entity {expert_domain: "session"})
RETURN e.name, e.type, e.domain
ORDER BY e.name
LIMIT 20
Check ingest endpoint health¶
curl -s http://localhost:8002/v1/memory/ingest \
-X POST \
-H "Content-Type: application/json" \
-d '{"session_summary":"health-check","domain":"session"}' | jq .
# Expected: {"status":"queued","domain":"session","length":12}