Sovereign MoE – Documentation¶

Self-hosted Multi-Model Orchestrator — Routes requests to specialized local LLMs, enriches context via Neo4j Knowledge Graph and web search, and synthesizes results with a Judge LLM. OpenAI-compatible API endpoint — works with Claude Code, Continue.dev, and any OpenAI-compatible client.

Section	Pages	Description
Installation	Installation · First-Time Setup	Install on Debian, deploy the stack, run the Setup Wizard
User Handbook	Quick Start · Handbook · API	Getting started, modes, skills, vision, API usage
Admin Backend	Overview	Manage users, budgets, templates, profiles
Federation	Overview	MoE Libris -- federated knowledge exchange between nodes
User Portal	Overview	Self-service for end users: usage, keys, billing
Reference	Authentication · Expert Prompts · Import/Export	API reference, system prompts, schemas
FAQ	FAQ	Common questions about Claude Code, API, troubleshooting
Changelog	Changelog	Version history of all releases

Service Overview¶

Service	URL	Purpose
Orchestrator API	`http://localhost:8002/v1`	Main endpoint (OpenAI-compatible)
Admin UI	`http://localhost:8088`	Configuration & monitoring
User Portal	`http://localhost:8088/user/dashboard`	End-user interface
Log Viewer (Dozzle)	`https://logs.moe-sovereign.org`	Browser-based container log viewer
Grafana	`http://localhost:3001`	Metrics dashboards
Prometheus	`http://localhost:9090`	Raw metrics
Neo4j Browser	`http://localhost:7474`	Knowledge graph explorer
MCP Server	`http://localhost:8003`	Precision tools

7B Ensemble — GPT-4o Class Performance, Self-Hosted¶

Benchmark result (April 2026): 8 domain-specialist 7–9B models on legacy Tesla M10 GPUs achieve 6.11 / 10 on MoE-Eval — the same score class as GPT-4o mini — with zero data leaving the cluster. Three consecutive overnight epochs, 36 scenarios, 0 failures.

	Single 7B	8× 7B Ensemble	30B+14B Orchestrated	H200 Cloud (120B)
MoE-Eval Score	3.3–3.6 / 10	6.11 / 10	7.60 / 10	9.00 / 10
VRAM required	8 GB	88 GB (distributed)	80 GB RTX cluster	H200 GPU
Data sovereignty	✅	✅	✅	❌ Cloud
Per-token cost	€0	€0	€0	Metered

The key insight: specialisation beats scale. A meditron:7b handles medical QA better than a general 14B model; mathstral:7b outperforms general models on MATH tasks; qwen2.5-coder:7b leads SWE-bench in its class. Routing each sub-task to its specialist model compounds these advantages without requiring any single model to be large enough to cover all domains.

→ Full benchmark report and LLM comparison

CLI Agents — Best Of¶

MoE Sovereign works with any OpenAI-compatible client, but execution-loop agents like Aider, Open Interpreter, and Continue.dev unlock the full capability stack: correction memory, semantic caching, domain-expert routing, and the Knowledge Graph all activate through their natural try → fail → fix loops.

Page	What it covers
CLI Agents — Best Of	Plain-language explanation of why and how, Before/After comparison, connection examples for each tool
Architectural Deep Dive	Delta table, Mermaid data-flow diagrams, measured thresholds from the implementation

Connecting with Claude Code¶

~/.claude/settings.json

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8002/v1",
    "ANTHROPIC_API_KEY": "moe-sk-..."
  }
}

Alternatively: configure a profile in the Admin UI under Profiles and enable it.

Documentation Structure¶

graph LR
    D[docs/]
    D --> IDX[index.md<br/>this page]
    D --> FAQ["faq.md<br/>Frequently asked questions<br/>(Claude Code, API, troubleshooting)"]
    D --> CL[changelog.md<br/>Version history]
    D --> G[guide/]
    D --> A[admin/]
    D --> P[portal/]
    D --> R[reference/]

    G --> GIDX[index.md<br/>User handbook – overview]
    G --> GQ[quickstart.md<br/>Services, pipeline, getting started]
    G --> GH[handout.md<br/>Complete user handbook]
    G --> GA["api.md<br/>API access, keys, curl &amp; SDK examples"]

    A --> AIDX[index.md<br/>Admin backend documentation]

    P --> PIDX[index.md<br/>User portal documentation]

    R --> RA["auth.md<br/>Authentication (OIDC, API key)"]
    R --> REP[expert-prompts.md<br/>System prompts for all expert roles]
    R --> RI[import-export.md<br/>JSON schemas for templates and profiles]

Stack¶

Component	Role
LangGraph	Pipeline orchestration
Ollama	Local LLM inference
ChromaDB	Semantic vector cache
Valkey	Checkpoints, budget counters, scoring
Neo4j 5	Knowledge graph (GraphRAG)
Apache Kafka	Event streaming & async learning
Prometheus + Grafana	Metrics & dashboards
FastAPI + uvicorn	HTTP API layer
PostgreSQL	User database