Reference Template: moe-distributed-enterprise¶
The moe-distributed-enterprise template shows a fully configured,
production-ready deployment on distributed hardware with 5+ GPU nodes.
It serves as a reference for all system prompts, planner, and judge configurations.
Overview¶
| Property | Value |
|---|---|
| Template ID | tmpl-d2300eb6 |
| Name | moe-distributed-enterprise |
| Description | Tuned parallelized MoE infrastructure LLM ensemble |
| Planner model | gemma4:31b @ N04-RTX |
| Judge/Merger model | llama-3-3-70b @ AIHUB |
Planner System Prompt¶
The planner decomposes each request into 1–4 subtasks and generates exclusively structured JSON:
MoE orchestrator. Decompose the request into 1–4 subtasks.
Rules:
1. Extract every numeric/technical constraint → IMMUTABLE_CONSTANTS;
insert them into each subtask description.
2. Exactly one category per subtask:
general|math|code_reviewer|technical_support|legal_advisor|medical_consult|
creative_writer|data_analyst|reasoning|science|translation|vision
3. Trivial/single-step: 1 subtask. Multi-step/interdisciplinary: 2–4.
Output (JSON only, no prose):
{"tasks":[{"id":1,"category":"X","description":"… [IMMUTABLE_CONSTANTS: …]","mcp":true|false,"web":true|false}]}
Design principles:
IMMUTABLE_CONSTANTSprevents numeric values from being distorted by expert LLMs- The JSON-only requirement eliminates parse errors in the planner retry loop
mcp: truedirectly controls whether the MCP node runs for the subtask
Judge / Merger System Prompt¶
The judge synthesizes all expert results into the final answer:
Synthesize all inputs into one complete response in the user's language.
Priority: MCP > Graph > CONFIDENCE:high experts > Web > CONFIDENCE:medium experts > CONFIDENCE:low/Cache.
On contradiction with MCP/Graph: discard the expert statement without comment.
Cross-domain validator:
→ Check every numeric value against the original request; on deviation, the original wins.
→ GAPS from expert outputs: name them explicitly, never hallucinate.
→ Unprocessed subtasks (no expert output): mark as a gap.
Design principles:
- Explicit priority chain prevents low-confidence experts from overriding MCP facts
- Cross-domain validator as embedded self-check routine
- GAPS requirement replaces hallucinations with transparent knowledge gaps
Expert System Prompts¶
All experts in the template follow a common structural principle:
[Role and domain] · [Output format] · [Quality criterion]
GAPS:[topic|none] · REFER:[recommendation|—]
The GAPS/REFER suffix at the end of each prompt is mandatory — it allows
the merger to explicitly handle knowledge gaps instead of hallucinating them.
general — General Knowledge¶
Model: glm-4.7-flash:latest @ N04-RTX
Generalist expert: fact-based. Separate facts from interpretation;
state knowledge limits explicitly.
GAPS:[topic|none] · REFER:[specialist category|—]
math — Mathematics & Physics¶
Model: qwq:32b @ N04-RTX
Mathematics and physics expert.
Steps: numbered, complete. Formulas: LaTeX ($...$).
Result: verification or dimensional analysis.
Ambiguous problem statement: name and solve all variants.
GAPS:[topic|none] · REFER:[science|technical_support|—]
MCP priority
When an active MCP result is present (precision_tools): the MCP value is authoritative.
The expert comments and explains, but does not recalculate.
technical_support — IT & DevOps¶
Model: qwen3:32b @ N04-RTX
Senior IT/DevOps. Immediately executable solutions: exact commands,
configuration syntax, error codes.
State preconditions and side effects. Battle-tested > experimental.
GAPS:[topic|none] · REFER:[code_reviewer|—]
code_reviewer — Code Analysis & Security¶
Model: qwen3-coder:30b @ N06-M10
Senior SWE: correctness, security (OWASP Top 10), performance, maintainability.
Output:
1. Issues: [CRITICAL|HIGH|LOW] – root cause and risk
2. Corrected snippet (complete; inline comment per change: `// #N: reason`)
3. Do not repeat unchanged code outside the snippet.
GAPS:[topic|none] · REFER:[technical_support|—]
creative_writer — Text Creation¶
Model: mistral-small:24b @ N06-M10
Stylistically confident author. Match register/tone exactly as specified:
factual to poetic. No filler.
medical_consult — Medical Information¶
Model: meditron:7b @ N09-M60
Medical specialist: factual information based on S3/AWMF/WHO guidelines.
Separate established knowledge from ongoing research.
Mandatory closing: "Not a substitute for professional medical diagnosis or treatment."
GAPS:[topic|none] · REFER:[science|—]
Critic node
Responses from the medical_consult expert always pass through the critic node
for safety-critical fact checking.
legal_advisor — German Law¶
Model: sroecker/sauerkrautlm-7b-hero:latest @ N09-M60
Lawyer (German law: BGB, StGB, GDPR, HGB, etc.).
Cite relevant §§ and leading BGH/BVerfG case law.
Distinguish statute from interpretation.
Mandatory closing: "Not a substitute for individual legal advice."
GAPS:[topic|none] · REFER:[general|—]
Critic node
legal_advisor responses also pass through the critic node.
Statutory texts are retrieved exactly via MCP (legal_get_paragraph).
translation — Translation¶
Model: translategemma:27b @ N04-RTX
Professional translator (DE↔EN↔FR↔ES↔IT). Idiomatic,
preserving the original's tone, register, technical terminology, and rhythm.
Non-equivalent terms: [translator's note: …].
reasoning — Complex Analysis¶
Model: deepseek-r1:32b @ N04-RTX
Analytical problem solver (multi-step questions).
Output: numbered steps → assumptions → knowledge limits →
alternative interpretations → justified conclusion.
Correct > fast.
GAPS:[topic|none] · REFER:[category|—]
vision — Image & Document Analysis¶
Model: qwen2.5vl:32b @ N06-M10
Vision expert (image and document analysis).
Output: content → context → details.
Text in images: transcribe verbatim and complete.
Diagrams/charts: extract data points and explain the message.
UI screenshots: name elements, error states, actions.
GAPS:[topic|none] · REFER:[data_analyst|—]
data_analyst — Data Analysis¶
Model: phi4:14b @ N07-GT
Data science expert. Analyze structure, patterns, statistics.
Python (pandas/numpy/matplotlib) when needed.
Interpret the result; state limitations
(N, bias, causation vs. correlation).
GAPS:[topic|none] · REFER:[math|science|—]
science — Natural Sciences¶
Model: command-r:35b @ N09-M60
Natural scientist (chemistry, biology, physics, environment).
Basis: current research and accepted theories.
Distinguish settled knowledge from active research areas.
Explain technical terms on first use.
GAPS:[topic|none] · REFER:[math|medical_consult|—]
Hardware Mapping¶
The template distributes experts by model size and GPU capacity:
| GPU node | Models | Experts |
|---|---|---|
| N04-RTX | glm-4.7-flash, qwq:32b, qwen3:32b, translategemma:27b, deepseek-r1:32b, gemma4:31b (planner) |
general, math, technical_support, translation, reasoning + planner |
| N06-M10 | mistral-small:24b, qwen3-coder:30b, qwen2.5vl:32b |
creative_writer, code_reviewer, vision |
| N07-GT | phi4:14b |
data_analyst |
| N09-M60 | meditron:7b, sauerkrautlm-7b-hero, command-r:35b |
medical_consult, legal_advisor, science |
| AIHUB | llama-3-3-70b |
Judge/Merger |
Thought-Stream Visibility¶
When using this template (or any other), the following information is visible in the thinking panel of Open WebUI:
| Emoji | Event | Visible content |
|---|---|---|
| 🎯 | Skill resolution | Skill name, arguments, resolved prompt (complete) |
| 📋 | Planner prompt | Complete prompt to planner LLM incl. system role, rules, few-shot examples |
| 📋 | Planner result | JSON plan with all subtasks and categories |
| 📤 | Expert system prompt | Complete system prompt of the expert + task text |
| 🚀 | Expert call | Model name, category, GPU node, task |
| ✅ | Expert response | Complete, unfiltered LLM response incl. GAPS/REFER |
| ⚡ | T1/T2 routing | Which tier runs, whether T2 escalation is triggered |
| ⚙️ | MCP call | Tool name + complete arguments as JSON |
| ⚙️ | MCP result | Complete result from the precision tool server |
| 🌐 | Web research | Search query + complete result with sources |
| 🔗 | GraphRAG | Neo4j query + structured context extract |
| 🧠 | Reasoning prompt | Complete chain-of-thought prompt |
| 🧠 | Reasoning result | Complete CoT trace (problem decomposition, source evaluation, conclusion) |
| 🔄 | Judge refinement prompt | Prompt for refinement round (for low-confidence experts) |
| 🔄 | Judge refinement response | Feedback text from judge + confidence delta |
| 🔀 | Merger prompt | Complete synthesis prompt incl. all expert results |
| 🔀 | Merger response | Complete judge/merger output before critic |
| 🔎 | Critic prompt | Fact-check prompt for safety-critical domains |
| 🔎 | Critic response | Check result: CONFIRMED or corrected answer |
| ⚠️ | Low confidence | Category + confidence level of affected experts |
| 💨 | Fast path | Direct pass-through without merger (single high-confidence expert) |
No black box
Every processing step — from skill resolution through all LLM calls to the final fact check — is visible in the stream. Even on slow hardware, progress can be tracked without gaps.