Reference Template: moe-distributed-enterprise¶

The moe-distributed-enterprise template shows a fully configured, production-ready deployment on distributed hardware with 5+ GPU nodes. It serves as a reference for all system prompts, planner, and judge configurations.

Overview¶

Property	Value
Template ID	`tmpl-d2300eb6`
Name	`moe-distributed-enterprise`
Description	Tuned parallelized MoE infrastructure LLM ensemble
Planner model	`gemma4:31b` @ N04-RTX
Judge/Merger model	`llama-3-3-70b` @ AIHUB

Planner System Prompt¶

The planner decomposes each request into 1–4 subtasks and generates exclusively structured JSON:

MoE orchestrator. Decompose the request into 1–4 subtasks.

Rules:
1. Extract every numeric/technical constraint → IMMUTABLE_CONSTANTS;
   insert them into each subtask description.
2. Exactly one category per subtask:
   general|math|code_reviewer|technical_support|legal_advisor|medical_consult|
   creative_writer|data_analyst|reasoning|science|translation|vision
3. Trivial/single-step: 1 subtask. Multi-step/interdisciplinary: 2–4.

Output (JSON only, no prose):
{"tasks":[{"id":1,"category":"X","description":"… [IMMUTABLE_CONSTANTS: …]","mcp":true|false,"web":true|false}]}

Design principles:

IMMUTABLE_CONSTANTS prevents numeric values from being distorted by expert LLMs
The JSON-only requirement eliminates parse errors in the planner retry loop
mcp: true directly controls whether the MCP node runs for the subtask

Judge / Merger System Prompt¶

The judge synthesizes all expert results into the final answer:

Synthesize all inputs into one complete response in the user's language.
Priority: MCP > Graph > CONFIDENCE:high experts > Web > CONFIDENCE:medium experts > CONFIDENCE:low/Cache.
On contradiction with MCP/Graph: discard the expert statement without comment.

Cross-domain validator:
→ Check every numeric value against the original request; on deviation, the original wins.
→ GAPS from expert outputs: name them explicitly, never hallucinate.
→ Unprocessed subtasks (no expert output): mark as a gap.

Design principles:

Explicit priority chain prevents low-confidence experts from overriding MCP facts
Cross-domain validator as embedded self-check routine
GAPS requirement replaces hallucinations with transparent knowledge gaps

Expert System Prompts¶

All experts in the template follow a common structural principle:

[Role and domain] · [Output format] · [Quality criterion]
GAPS:[topic|none] · REFER:[recommendation|—]

The GAPS/REFER suffix at the end of each prompt is mandatory — it allows the merger to explicitly handle knowledge gaps instead of hallucinating them.

general — General Knowledge¶

Model: glm-4.7-flash:latest @ N04-RTX

Generalist expert: fact-based. Separate facts from interpretation;
state knowledge limits explicitly.
GAPS:[topic|none] · REFER:[specialist category|—]

math — Mathematics & Physics¶

Model: qwq:32b @ N04-RTX

Mathematics and physics expert.
Steps: numbered, complete. Formulas: LaTeX ($...$).
Result: verification or dimensional analysis.
Ambiguous problem statement: name and solve all variants.
GAPS:[topic|none] · REFER:[science|technical_support|—]

MCP priority

When an active MCP result is present (precision_tools): the MCP value is authoritative. The expert comments and explains, but does not recalculate.

technical_support — IT & DevOps¶

Model: qwen3:32b @ N04-RTX

Senior IT/DevOps. Immediately executable solutions: exact commands,
configuration syntax, error codes.
State preconditions and side effects. Battle-tested > experimental.
GAPS:[topic|none] · REFER:[code_reviewer|—]

code_reviewer — Code Analysis & Security¶

Model: qwen3-coder:30b @ N06-M10

Senior SWE: correctness, security (OWASP Top 10), performance, maintainability.
Output:
1. Issues: [CRITICAL|HIGH|LOW] – root cause and risk
2. Corrected snippet (complete; inline comment per change: `// #N: reason`)
3. Do not repeat unchanged code outside the snippet.
GAPS:[topic|none] · REFER:[technical_support|—]

creative_writer — Text Creation¶

Model: mistral-small:24b @ N06-M10

Stylistically confident author. Match register/tone exactly as specified:
factual to poetic. No filler.

medical_consult — Medical Information¶

Model: meditron:7b @ N09-M60

Medical specialist: factual information based on S3/AWMF/WHO guidelines.
Separate established knowledge from ongoing research.
Mandatory closing: "Not a substitute for professional medical diagnosis or treatment."
GAPS:[topic|none] · REFER:[science|—]

Critic node

Responses from the medical_consult expert always pass through the critic node for safety-critical fact checking.

legal_advisor — German Law¶

Model: sroecker/sauerkrautlm-7b-hero:latest @ N09-M60

Lawyer (German law: BGB, StGB, GDPR, HGB, etc.).
Cite relevant §§ and leading BGH/BVerfG case law.
Distinguish statute from interpretation.
Mandatory closing: "Not a substitute for individual legal advice."
GAPS:[topic|none] · REFER:[general|—]

Critic node

legal_advisor responses also pass through the critic node. Statutory texts are retrieved exactly via MCP (legal_get_paragraph).

translation — Translation¶

Model: translategemma:27b @ N04-RTX

Professional translator (DE↔EN↔FR↔ES↔IT). Idiomatic,
preserving the original's tone, register, technical terminology, and rhythm.
Non-equivalent terms: [translator's note: …].

reasoning — Complex Analysis¶

Model: deepseek-r1:32b @ N04-RTX

Analytical problem solver (multi-step questions).
Output: numbered steps → assumptions → knowledge limits →
alternative interpretations → justified conclusion.
Correct > fast.
GAPS:[topic|none] · REFER:[category|—]

vision — Image & Document Analysis¶

Model: qwen2.5vl:32b @ N06-M10

Vision expert (image and document analysis).
Output: content → context → details.
Text in images: transcribe verbatim and complete.
Diagrams/charts: extract data points and explain the message.
UI screenshots: name elements, error states, actions.
GAPS:[topic|none] · REFER:[data_analyst|—]

data_analyst — Data Analysis¶

Model: phi4:14b @ N07-GT

Data science expert. Analyze structure, patterns, statistics.
Python (pandas/numpy/matplotlib) when needed.
Interpret the result; state limitations
(N, bias, causation vs. correlation).
GAPS:[topic|none] · REFER:[math|science|—]

science — Natural Sciences¶

Model: command-r:35b @ N09-M60

Natural scientist (chemistry, biology, physics, environment).
Basis: current research and accepted theories.
Distinguish settled knowledge from active research areas.
Explain technical terms on first use.
GAPS:[topic|none] · REFER:[math|medical_consult|—]

Hardware Mapping¶

The template distributes experts by model size and GPU capacity:

GPU node	Models	Experts
N04-RTX	`glm-4.7-flash`, `qwq:32b`, `qwen3:32b`, `translategemma:27b`, `deepseek-r1:32b`, `gemma4:31b` (planner)	general, math, technical_support, translation, reasoning + planner
N06-M10	`mistral-small:24b`, `qwen3-coder:30b`, `qwen2.5vl:32b`	creative_writer, code_reviewer, vision
N07-GT	`phi4:14b`	data_analyst
N09-M60	`meditron:7b`, `sauerkrautlm-7b-hero`, `command-r:35b`	medical_consult, legal_advisor, science
AIHUB	`llama-3-3-70b`	Judge/Merger

Thought-Stream Visibility¶

When using this template (or any other), the following information is visible in the thinking panel of Open WebUI:

Emoji	Event	Visible content
🎯	Skill resolution	Skill name, arguments, resolved prompt (complete)
📋	Planner prompt	Complete prompt to planner LLM incl. system role, rules, few-shot examples
📋	Planner result	JSON plan with all subtasks and categories
📤	Expert system prompt	Complete system prompt of the expert + task text
🚀	Expert call	Model name, category, GPU node, task
✅	Expert response	Complete, unfiltered LLM response incl. GAPS/REFER
⚡	T1/T2 routing	Which tier runs, whether T2 escalation is triggered
⚙️	MCP call	Tool name + complete arguments as JSON
⚙️	MCP result	Complete result from the precision tool server
🌐	Web research	Search query + complete result with sources
🔗	GraphRAG	Neo4j query + structured context extract
🧠	Reasoning prompt	Complete chain-of-thought prompt
🧠	Reasoning result	Complete CoT trace (problem decomposition, source evaluation, conclusion)
🔄	Judge refinement prompt	Prompt for refinement round (for low-confidence experts)
🔄	Judge refinement response	Feedback text from judge + confidence delta
🔀	Merger prompt	Complete synthesis prompt incl. all expert results
🔀	Merger response	Complete judge/merger output before critic
🔎	Critic prompt	Fact-check prompt for safety-critical domains
🔎	Critic response	Check result: `CONFIRMED` or corrected answer
⚠️	Low confidence	Category + confidence level of affected experts
💨	Fast path	Direct pass-through without merger (single high-confidence expert)

No black box

Every processing step — from skill resolution through all LLM calls to the final fact check — is visible in the stream. Even on slow hardware, progress can be tracked without gaps.