Skip to content

Tool Stack Overview

Sovereign MoE combines several specialized open-source components into a coherent orchestration stack. Each component solves a specific problem that could not be solved with a single, monolithic LLM system.

Architecture Diagram

flowchart TD
    CLIENT["Client\n(Open WebUI · curl · SDK)"]

    subgraph ORCH["LangGraph Orchestrator · Port 8002"]
        direction TB
        CACHE["cache_lookup\n(ChromaDB Semantic)"]
        PLAN["planner\n(Judge-LLM)"]
        WORKERS["workers\n(Expert LLMs)"]
        RESEARCH["research\n(SearXNG)"]
        MATH["math\n(SymPy internal)"]
        MCP_N["mcp\n(Precision Tools)"]
        GRAPH_N["graph_rag\n(Neo4j)"]
        MERGER["merger\n(Judge-LLM)"]
        THINKING["thinking\n(CoT, conditional)"]
        CRITIC["critic\n(fact-check)"]
    end

    subgraph INFERENCE["Inference Servers (configured via Admin UI)"]
        direction LR
        SRV1["Inference Server 1\nOllama-compatible"]
        SRV2["Inference Server 2\noptional"]
    end

    subgraph PERSIST["Persistence Layer"]
        REDIS[("Valkey\nPort 6379\nScoring · Session Cache")]
        CHROMA[("ChromaDB\nPort 8001\nSemantic Cache")]
        NEO4J[("Neo4j\nPort 7687\nKnowledge Graph")]
    end

    subgraph STREAMING["Async Streaming"]
        KAFKA[("Kafka\nPort 9092\nmoe.ingest · moe.requests · moe.feedback")]
        KCONS["Kafka Consumer\n→ Neo4j Ingest"]
        KAFKA --> KCONS --> NEO4J
    end

    MCP_SERVER["MCP Precision Tools\nPort 8003\n16 deterministic tools"]
    SEARXNG["SearXNG\nPort 8888\nPrivate web search"]

    CLIENT -->|"POST /v1/chat/completions"| CACHE
    CACHE -->|"Hit"| CLIENT
    CACHE -->|"Miss"| PLAN
    PLAN --> WORKERS & RESEARCH & MATH & MCP_N & GRAPH_N
    WORKERS --> INFERENCE
    PLAN --> INFERENCE
    MERGER --> INFERENCE
    RESEARCH --> SEARXNG
    MCP_N --> MCP_SERVER
    GRAPH_N --> NEO4J
    WORKERS -->|"Confidence < threshold"| THINKING
    THINKING --> MERGER
    WORKERS & RESEARCH & MATH & MCP_N & GRAPH_N --> MERGER
    MERGER --> CRITIC --> CLIENT
    MERGER --> CHROMA
    MERGER --> REDIS
    MERGER -->|"moe.ingest + moe.requests"| KAFKA

Component Overview

Component Role Port Documentation
LangGraph Orchestration, parallel fan-out, state management internal langgraph.md
Ollama Multi-node LLM inference 11434 ollama_cluster.md
Neo4j Temporal GraphRAG, knowledge graph 7687 graphrag_neo4j.md
Valkey Expert scoring, session cache 6379
ChromaDB Semantic response cache 8001
Kafka Async ingest buffer, audit log 9092 Kafka docs
SearXNG Private web search (no Google tracking) 8888
MCP Server 16 deterministic precision tools 8003 mcp_tools.md

Design Principles

Determinism over LLM estimation — calculations, hashes, date operations, and network subnet calculations always run through the MCP server, never through a language model.

Decoupling via Kafka — the HTTP response path and data persistence are completely separated. A Kafka outage blocks no responses, only later graph learning.

Heterogeneous hardware — Ollama abstracts different GPU generations (consumer cards to enterprise Tesla) behind a unified OpenAI API. Inference servers are configured via Admin UI → Servers, with priority routing weighted by availability.

No vendor lock-in — all components are self-hosted. SearXNG instead of Google, Ollama instead of OpenAI, Neo4j Community instead of vector-based cloud services.