Sovereign MoE – Documentation¶
Self-hosted Multi-Model Orchestrator — Routes requests to specialized local LLMs, enriches context via Neo4j Knowledge Graph and web search, and synthesizes results with a Judge LLM. OpenAI-compatible API endpoint — works with Claude Code, Continue.dev, and any OpenAI-compatible client.
Quick Navigation¶
| Section | Pages | Description |
|---|---|---|
| Installation | Installation · First-Time Setup | Install on Debian, deploy the stack, run the Setup Wizard |
| User Handbook | Quick Start · Handbook · API | Getting started, modes, skills, vision, API usage |
| Admin Backend | Overview | Manage users, budgets, templates, profiles |
| Federation | Overview | MoE Libris -- federated knowledge exchange between nodes |
| User Portal | Overview | Self-service for end users: usage, keys, billing |
| Reference | Authentication · Expert Prompts · Import/Export | API reference, system prompts, schemas |
| FAQ | FAQ | Common questions about Claude Code, API, troubleshooting |
| Changelog | Changelog | Version history of all releases |
Service Overview¶
| Service | URL | Purpose |
|---|---|---|
| Orchestrator API | http://localhost:8002/v1 |
Main endpoint (OpenAI-compatible) |
| Admin UI | http://localhost:8088 |
Configuration & monitoring |
| User Portal | http://localhost:8088/user/dashboard |
End-user interface |
| Log Viewer (Dozzle) | https://logs.moe-sovereign.org |
Browser-based container log viewer |
| Grafana | http://localhost:3001 |
Metrics dashboards |
| Prometheus | http://localhost:9090 |
Raw metrics |
| Neo4j Browser | http://localhost:7474 |
Knowledge graph explorer |
| MCP Server | http://localhost:8003 |
Precision tools |
7B Ensemble — GPT-4o Class Performance, Self-Hosted¶
Benchmark result (April 2026): 8 domain-specialist 7–9B models on legacy Tesla M10 GPUs achieve 6.11 / 10 on MoE-Eval — the same score class as GPT-4o mini — with zero data leaving the cluster. Three consecutive overnight epochs, 36 scenarios, 0 failures.
| Single 7B | 8× 7B Ensemble | 30B+14B Orchestrated | H200 Cloud (120B) | |
|---|---|---|---|---|
| MoE-Eval Score | 3.3–3.6 / 10 | 6.11 / 10 | 7.60 / 10 | 9.00 / 10 |
| VRAM required | 8 GB | 88 GB (distributed) | 80 GB RTX cluster | H200 GPU |
| Data sovereignty | ✅ | ✅ | ✅ | ❌ Cloud |
| Per-token cost | €0 | €0 | €0 | Metered |
The key insight: specialisation beats scale. A meditron:7b handles medical QA better than
a general 14B model; mathstral:7b outperforms general models on MATH tasks; qwen2.5-coder:7b
leads SWE-bench in its class. Routing each sub-task to its specialist model compounds these
advantages without requiring any single model to be large enough to cover all domains.
→ Full benchmark report and LLM comparison
CLI Agents — Best Of¶
MoE Sovereign works with any OpenAI-compatible client, but execution-loop agents like Aider, Open Interpreter, and Continue.dev unlock the full capability stack: correction memory, semantic caching, domain-expert routing, and the Knowledge Graph all activate through their natural try → fail → fix loops.
| Page | What it covers |
|---|---|
| CLI Agents — Best Of | Plain-language explanation of why and how, Before/After comparison, connection examples for each tool |
| Architectural Deep Dive | Delta table, Mermaid data-flow diagrams, measured thresholds from the implementation |
Connecting with Claude Code¶
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:8002/v1",
"ANTHROPIC_API_KEY": "moe-sk-..."
}
}
Alternatively: configure a profile in the Admin UI under Profiles and enable it.
Documentation Structure¶
graph LR
D[docs/]
D --> IDX[index.md<br/>this page]
D --> FAQ["faq.md<br/>Frequently asked questions<br/>(Claude Code, API, troubleshooting)"]
D --> CL[changelog.md<br/>Version history]
D --> G[guide/]
D --> A[admin/]
D --> P[portal/]
D --> R[reference/]
G --> GIDX[index.md<br/>User handbook – overview]
G --> GQ[quickstart.md<br/>Services, pipeline, getting started]
G --> GH[handout.md<br/>Complete user handbook]
G --> GA["api.md<br/>API access, keys, curl & SDK examples"]
A --> AIDX[index.md<br/>Admin backend documentation]
P --> PIDX[index.md<br/>User portal documentation]
R --> RA["auth.md<br/>Authentication (OIDC, API key)"]
R --> REP[expert-prompts.md<br/>System prompts for all expert roles]
R --> RI[import-export.md<br/>JSON schemas for templates and profiles]
Stack¶
| Component | Role |
|---|---|
| LangGraph | Pipeline orchestration |
| Ollama | Local LLM inference |
| ChromaDB | Semantic vector cache |
| Valkey | Checkpoints, budget counters, scoring |
| Neo4j 5 | Knowledge graph (GraphRAG) |
| Apache Kafka | Event streaming & async learning |
| Prometheus + Grafana | Metrics & dashboards |
| FastAPI + uvicorn | HTTP API layer |
| PostgreSQL | User database |