Autonomous Knowledge Healing¶
The MoE Sovereign orchestrator includes a self-healing knowledge loop that automatically identifies and closes gaps in the Neo4j knowledge graph — without human intervention.
The three compounding levels¶
flowchart TB
subgraph L1["Level 1: Request-time"]
R[User request] --> S[SYNTHESIS_INSIGHT]
S --> N1[Neo4j: new triples]
end
subgraph L2["Level 2: Session-time"]
C[Critic node] --> FS[Few-shot corrections]
FS --> V[Valkey persistence]
end
subgraph L3["Level 3: Nightly (autonomous)"]
G[Read ontology gaps] --> RES[MoE researches each term]
RES --> M[MERGE into Neo4j]
M --> LESS[Fewer gaps tomorrow]
end
L1 --> L2
L2 --> L3
L3 -.->|denser graph| L1
- Level 1 (per request): the merger node emits
<SYNTHESIS_INSIGHT>blocks that are ingested into Neo4j as new entities and relations. - Level 2 (per session): the critic node detects numerical deviations and persists few-shot corrections that improve future answers.
- Level 3 (nightly cron): the
close_ontology_gaps.pyscript reads terms the system could not classify, uses the MoE orchestrator itself to research them, and writes the results back into Neo4j.
What are ontology gaps?¶
An ontology gap is a term that appears in an LLM answer but cannot be
matched to any existing Neo4j entity label. The orchestrator counts these
via the Prometheus metric moe_ontology_gaps_total.
Example: if the merger output mentions "Differential Privacy" but the knowledge graph has no entity with that name or alias, it is counted as a gap. The admin can see the top gaps in the Admin UI.
How the gap healer works¶
Previous architecture (v1 — systemd timer)¶
The original script scripts/close_ontology_gaps.py ran as a nightly systemd timer:
- Reads the top-N gaps from
/v1/admin/ontology-gaps - Researches each term via a single MoE orchestrator call
- Parses JSON response (entity type, aliases, relations)
- Writes entities into Neo4j via idempotent
MERGE - Logs every action for auditing
This worked for small queues but collapsed under load: a global
asyncio.Semaphore(4) caused all concurrent tasks to pile onto whichever
inference node had a warm model cache, creating 300+ hung tasks and zero
throughput on the remaining nodes.
Current architecture (v2 — per-node Redis slots)¶
The rewrite scripts/gap_healer_templates.py replaces the single-process
systemd timer with a persistent async healer that is launched from the
Admin UI and manages per-node concurrency via Redis.
Core mechanism:
Redis sorted set: moe:ontology_gaps (term → ZINCRBY score)
Redis string: moe:healer:active:{node} (current slot count)
Redis string: moe:healer:runs:{node} (successful run counter)
On each cycle:
ZPOPMAX moe:ontology_gaps COUNT N— atomic claim of top-N gaps- Per gap: check
moe:healer:active:{node}against hardware cap - If slot available:
INCR active, dispatch to node curator template - On success:
INCR runs:{node},DECR active:{node} - Progressive slot unlock: every 5 successful runs →
cap = min(cap+1, _NODE_MAX_SLOTS[node])
Hardware caps (_NODE_MAX_SLOTS):
| Node class | Max concurrent slots |
|---|---|
| Tesla M60 (4×8 GB) | 1 |
| Tesla M10 (1×8 GB) | 3 |
| RTX 4090 (5×12 GB) | 4 |
| GT 1060 (2×6 GB) | 2 |
Nodes start with 1 slot ("cold start") and unlock additional slots progressively as runs succeed. This prevents VRAM exhaustion on the first burst while eventually reaching full throughput on stable nodes.
Why ZPOPMAX instead of ZRANGEBYSCORE: Atomic pop removes the gap from the queue at claim time, eliminating double-processing under concurrent healers and preserving the priority order (highest-frequency gaps first).
Starting the healer¶
The gap healer is started from the Admin UI → Monitoring → Gap Healer panel. The admin can:
- Start / Stop the healer process
- Observe per-node slot counters and run totals in real time
- Inspect the last 50 healed terms and their classification results
There is no longer a systemd timer dependency. The healer runs as a
long-lived async task inside the moe-admin container.
Manual invocation¶
# Inside the moe-admin container
docker compose exec moe-admin python3 -m scripts.gap_healer_templates
# With dry-run (classify but do not write to Neo4j)
DRY_RUN=1 docker compose exec moe-admin python3 -m scripts.gap_healer_templates
Configuration¶
| Variable | Default | Description |
|---|---|---|
MOE_API_KEY |
(required) | API key for the orchestrator |
NEO4J_URI |
bolt://neo4j-knowledge:7687 |
Neo4j connection string |
REDIS_URL |
redis://:password@valkey:6379 |
Valkey/Redis connection |
DRY_RUN |
0 |
1 = classify without writing to Neo4j |
HEALER_CYCLE_SLEEP |
5 |
Seconds between queue poll cycles |
HEALER_MAX_BATCH |
10 |
Gaps claimed per ZPOPMAX call |
The _NODE_MAX_SLOTS dict in gap_healer_templates.py maps node name
prefixes to hardware concurrency caps. Adjust after adding or removing
inference nodes.
Ontology anatomy: entities, relations, gaps¶
The healing loop operates on three interlocking data structures:
Neo4j entities — nodes in the knowledge graph, each carrying a canonical
name, a typed classification (e.g. Framework, Protocol, Concept), a
bilingual description and a confidence score. Every entity is a "known
term" the system can reason about.
Neo4j relations — typed, directed edges (IS_A, USES, IMPLEMENTS,
PART_OF, EXTENDS, RELATED_TO) that turn the node set into an actual
graph. Relations are what make the ontology queryable beyond a flat
vocabulary — GraphRAG traverses them at retrieval time.
Ontology gaps — entries in the Valkey sorted set moe:ontology_gaps.
Each gap is a term that appeared in an answer but could not be matched to
any existing entity's name or alias. The score counts how often the term
was seen. Gaps are the graph's "known unknowns": terms the system has
encountered but not yet taxonomised.
The loop closes as follows. When a user asks a question, the orchestrator
produces an answer and extracts noun-like terms from it. Each term is
checked against Neo4j. Terms that match existing entities are retained as
provenance. Terms that do not match are ZINCRBY-ed into
moe:ontology_gaps. The gap healer later pulls the top gaps, routes them
through a curator expert template for classification, writes the resulting
entities and relations into Neo4j via idempotent MERGE, and ZREMs the
resolved term from the gap queue.
user request ──▶ answer ──▶ term extraction
│
├─ known term ──▶ Neo4j entity (provenance)
│
└─ unknown term ──▶ Valkey gap queue
│
gap healer ◀────────────────── │
│
▼
curator template classifies
│
▼
MERGE entity + relations
into Neo4j
│
▼
ZREM term from gap queue
The self-replenishing trap¶
A naive implementation turns the healer into a net-positive gap producer:
classifying Flask generates a description like "Flask is a Python
web framework that implements WSGI", whose noun extraction adds
Python, WebFramework, and WSGI as three new gaps. One resolved term
produces three new ones — the gap queue monotonically grows.
The MoE orchestrator prevents this by tagging every Kafka moe.ingest
message with the originating template_name. When the gap-detection
branch in the ingest consumer sees a template name containing
ontology-curator, it skips the ZINCRBY step entirely: curator
responses are classifications of gaps, not sources of them. This
single flag converts the loop from divergent to convergent.
Parallel healing via per-node curator templates¶
For a hetero-GPU cluster (mixed RTX and Tesla nodes) the healer uses a pool of per-node curator templates, each pinning its planner, all expert roles and the judge to a single physical node. This keeps the warm-model cache local to one GPU and lets four or five concurrent classifications land on four or five different nodes without routing contention.
Example pool:
moe-ontology-curator-n04-rtx → 5×12 GB RTX, planner qwen2.5:7b,
general mistral-nemo:12b
moe-ontology-curator-n06-m10-01 → 1×8 GB Tesla M10, mistral-nemo:latest
moe-ontology-curator-n06-m10-02 → 1×8 GB Tesla M10, glm4:9b
moe-ontology-curator-n06-m10-03 → 1×8 GB Tesla M10, llama3.1:8b
moe-ontology-curator-n06-m10-04 → 1×8 GB Tesla M10, hermes3:8b
moe-ontology-curator-n07-gt → 2×6 GB GT 1060, uniform 7B quantised
moe-ontology-curator-n09-m60 → 4×8 GB Tesla M60, mistral:7b + hermes3:8b
moe-ontology-curator-n11-m10-01 → 1×8 GB Tesla M10, mistral:7b
moe-ontology-curator-n11-m10-02 → 1×8 GB Tesla M10, glm4:9b (cross-node vs N06-02)
moe-ontology-curator-n11-m10-03 → 1×8 GB Tesla M10, llama3.1:8b (cross-node vs N06-03)
moe-ontology-curator-n11-m10-04 → 1×8 GB Tesla M10, qwen2.5:7b
The healer rotates round-robin across the pool. Because each template
has an explicit @node suffix on every model reference, the sticky-
session router in _select_node is bypassed entirely — load balancing
happens at the client, not inside the orchestrator.
Lessons learned during deployment:
- Per-node templates beat floating mode for sustained throughput. Floating routing collapsed onto a single warm node within seconds.
- Tesla M60 and M10 crash on 9B+ at Q4. Swapped to
mistral:7bandllama3.1:8b(both Q4_K_M, ~4.5 GB file, ~7 GB runtime VRAM). - The admin HTTP endpoint is not durable under load. The healer now reads the gap queue from Valkey directly and uses the admin endpoint only as a fallback.
- Neo4j property polymorphism bites silently. Some legacy entities
have
namestored as aStringArray;toLoweron an array throws a Cypher type error thatasyncio.gather(return_exceptions=True)will swallow. The healer filters array-typed names withvalueType(e.name) STARTS WITH 'LIST'. - Permissions must include the curator template IDs. The orchestrator
silently falls back to the first allowed template when the requested
one is not in the user's permission set, routing requests to the wrong
node. Grant
expert_templateperms for every curator template ID.
Observed growth¶
After a benchmark session (71 jobs, 9 cognitive tests):
| Metric | Before | After | Growth |
|---|---|---|---|
| Graph entities | ~400 | 1,542 | +285% |
| Graph relations | ~200 | 1,264 | +532% |
| Ontology gaps | 0 | 140 | 140 new terms identified |
The 140 gaps represent terms like "Legitimate Interest Assessment", "Data Protection Impact Assessment", "Differential Privacy", "Subnet Mask", "Broadcast Address" — domain terminology that the base ontology (400 entities) does not yet cover. The nightly healer will research and classify these automatically.