Universal Observability with Grafana Alloy¶
One of the hardest parts of a multi-tier deployment is correlating logs and traces across tiers. A user request that enters the orchestrator on an LXC edge node, crosses into a Kafka cluster on Kubernetes, and ends in a Postgres write on an external DB host should be visible as a single trace in your observability backend.
MoE Sovereign solves this with a single Grafana Alloy config (deploy/alloy/alloy.river)
and W3C traceparent-header propagation in the orchestrator middleware.
The universal pipeline¶
flowchart TB
subgraph LXC[LXC edge node]
O1[orchestrator<br/>UID 1001]
J1[journald]
A1[Alloy<br/>systemd service]
O1 -- stdout --> J1 --> A1
end
subgraph COMPOSE[Docker Compose host]
O2[orchestrator]
D2[docker.sock]
A2[Alloy sidecar]
O2 --> D2 --> A2
end
subgraph K8S[Kubernetes cluster]
O3[orchestrator pods]
K3[kubelet /var/log]
A3[Alloy DaemonSet]
O3 --> K3 --> A3
end
subgraph BACK[Observability backend]
L[(Loki)]
T[(Tempo)]
P[(Prometheus /<br/>remote_write)]
G[(Grafana)]
end
A1 -- loki.write --> L
A2 -- loki.write --> L
A3 -- loki.write --> L
A1 -- OTLP --> T
A2 -- OTLP --> T
A3 -- OTLP --> T
A1 & A2 & A3 -- metrics --> P
L --> G
T --> G
P --> G
classDef edge fill:#ecfdf5,stroke:#059669;
classDef comp fill:#fef9c3,stroke:#ca8a04;
classDef k8s fill:#eef2ff,stroke:#6366f1;
classDef back fill:#fef3c7,stroke:#d97706;
class LXC,O1,J1,A1 edge;
class COMPOSE,O2,D2,A2 comp;
class K8S,O3,K3,A3 k8s;
class BACK,L,T,P,G back;
The same alloy.river file runs in all three tiers. Only the log-source
component differs (loki.source.journal for LXC, loki.source.docker for
Compose, loki.source.kubernetes for k8s) — everything downstream is identical.
Trace-ID propagation¶
sequenceDiagram
actor Client
participant Edge as LXC orchestrator
participant Kafka as Kafka (k8s)
participant Worker as Worker pod (k8s)
participant PG as Postgres (external)
participant Loki
Client->>Edge: POST /v1/chat/completions
Note over Edge: middleware reads traceparent,<br/>generates one if missing
Edge->>Edge: log: "request started" (trace_id=abc123)
Edge->>Kafka: produce event (headers: traceparent=abc123)
Kafka->>Worker: consume (headers preserved)
Worker->>Worker: log: "processing" (trace_id=abc123)
Worker->>PG: INSERT
Worker->>Worker: log: "done" (trace_id=abc123)
Edge->>Loki: loki.write (labels: cluster=lxc, trace_id=abc123)
Worker->>Loki: loki.write (labels: cluster=k8s, trace_id=abc123)
Note over Loki: one Loki query:<br/>{trace_id="abc123"}<br/>returns logs from both tiers
The key ingredient is a tiny Alloy pipeline stage that extracts the hex trace ID from the log line and promotes it to a Loki label:
loki.process "enrich" {
stage.regex {
expression = "traceparent=00-(?P<trace_id>[0-9a-f]{32})-"
}
stage.labels {
values = { trace_id = "" }
}
}
With that label in place, Grafana's "Derived Fields" feature turns every
trace_id label in a Loki panel into a clickable link to the matching Tempo
trace — giving you end-to-end observability across heterogeneous wrappers.
Required environment variables¶
These are read by Alloy at startup (via /etc/default/alloy in LXC, env
block in the DaemonSet):
| Variable | Example | Purpose |
|---|---|---|
LOKI_URL |
https://loki.example.com/loki/api/v1/push |
Loki push endpoint |
TEMPO_URL |
tempo.example.com:4317 |
OTLP gRPC for Tempo |
PROM_REMOTE_WRITE_URL |
https://prom.example.com/api/v1/write |
Prometheus remote_write |
MOE_HOSTNAME |
lxc-edge-1 |
Host label applied to every log |
MOE_CLUSTER |
lxc / homelab / prod-eu1 |
Cluster label — the primary dimension for cross-tier filtering |
Metrics side¶
The orchestrator exposes a prometheus_client endpoint at :8000/metrics.
Alloy scrapes it every 15 s and forwards via prometheus.remote_write:
prometheus.scrape "moe_orchestrator" {
targets = [{ __address__ = "127.0.0.1:8000", job = "moe-orchestrator" }]
metrics_path = "/metrics"
scrape_interval = "15s"
forward_to = [prometheus.remote_write.default.receiver]
}
So even in a federated setup where LXC edges write to a central Prometheus,
the metric label set (cluster, host, job) matches the Loki label set —
you can pivot from a Grafana dashboard panel straight into the matching logs.
Quick smoke test¶
On any tier, after deployment:
# 1. Fire a request with a known trace id
TID=$(openssl rand -hex 16)
curl -X POST http://<orchestrator>:8000/v1/chat/completions \
-H "traceparent: 00-${TID}-$(openssl rand -hex 8)-01" \
-H 'Content-Type: application/json' \
-d '{"model":"auto","messages":[{"role":"user","content":"ping"}]}'
# 2. Find the same trace_id in Loki (from any tier)
# In Grafana: { cluster=~".+" } |= "$TID"
# 3. Click the trace_id in the result → Tempo trace opens
If step 2 returns lines from more than one cluster label, the cross-tier
correlation is working.