Token Budget & Costs¶
The budget system controls token usage per user and calculates the resulting costs based on a configurable token price.
Budget Configuration in Admin Backend¶
Assignment Workflow¶
- Admin opens User Edit → Tab Budget
- Set limits (all optional, empty = unlimited):
- Daily limit (tokens/day, reset daily at midnight UTC)
- Monthly limit (tokens/month, reset on the 1st of the month UTC)
- Total limit (cumulative, no automatic reset)
- Select budget type:
Subscription: Daily and monthly limits are automatically resetOne-time: Onlytotal_limitis relevant, no reset- Save budget → writes to PostgreSQL + syncs Valkey
Budget Types¶
| Type | Reset | Recommended Limits |
|---|---|---|
subscription |
Daily + Monthly | daily_limit + monthly_limit |
onetime |
No reset | Only total_limit |
Token Counting¶
What is counted?¶
Every API request counts both prompt tokens (input) and completion tokens (output):
Cost Factor¶
A cost factor can be applied to the actual token usage:
| Source | Field | Default |
|---|---|---|
| User template | cost_factor (REAL) |
1.0 |
| Inference server | cost_factor (REAL) |
1.0 |
The cost factor is derived from the user's active expert template and/or the inference server used, and cached in Valkey for 24 hours:
Example:
| Raw Tokens | Cost Factor | Effective Tokens |
|---|---|---|
| 1,000 | 1.0 | 1,000 |
| 1,000 | 1.5 | 1,500 |
| 1,000 | 0.8 | 800 |
Valkey Counters (Real-Time)¶
All budget counters run in Valkey for minimal latency:
user:{id}:tokens:daily:{YYYY-MM-DD} → INCRBY TTL 48h
user:{id}:tokens:monthly:{YYYY-MM} → INCRBY TTL 35d
user:{id}:tokens:total → INCRBY
Budget exceeded is detected on the next API request. The request is rejected with HTTP 429.
Cost Calculation (EUR)¶
Configuring Token Price¶
The global token price is set in the Admin Dashboard under Token Price:
Default: 0.00002 EUR per token (= €0.02 per 1,000 tokens).
The value is stored in the .env file as TOKEN_PRICE_EUR.
Cost Formula¶
Example (default configuration):
| Tokens | Cost Factor | Effective | Price/Token | Cost |
|---|---|---|---|---|
| 10,000 | 1.0 | 10,000 | €0.00002 | €0.20 |
| 10,000 | 1.5 | 15,000 | €0.00002 | €0.30 |
| 50,000 | 1.0 | 50,000 | €0.00002 | €1.00 |
Where is the price shown?¶
- Admin → User Edit → Tab Budget: Current usage with
≈ €X.XXbelow each counter - User Portal → Billing: Full breakdown by period and model
- User Portal → Dashboard: Compact budget cards with percentage display
Budget Alerts¶
Users can configure automatic email notifications when their budget is nearly exhausted.
Configuration (by user in the portal)¶
| Field | Description | Default |
|---|---|---|
| Alert enabled | On/Off toggle | Off |
| Threshold | At what % to warn | 80% |
| Email address | Recipient (default: account email) | – |
Behavior¶
- Background task runs hourly
- Checks: was the threshold exceeded in any period?
- Max. 1 alert per user per 24 hours (rate limiting via
last_alert_sent_at) - Period coverage: Daily AND Monthly AND Total
Alert Email Content¶
Subject: Budget Warning: Daily limit 85% exhausted
Your daily token budget is 85% exhausted.
Usage: 85,000 / 100,000 tokens
Remaining: 15,000 tokens
Reset: in 6 hours
Usage Log¶
Every API request is logged in the PostgreSQL database:
CREATE TABLE usage_log (
id TEXT PRIMARY KEY,
user_id TEXT NOT NULL,
api_key_id TEXT,
request_id TEXT NOT NULL,
model TEXT NOT NULL,
moe_mode TEXT NOT NULL,
prompt_tokens INTEGER DEFAULT 0,
completion_tokens INTEGER DEFAULT 0,
total_tokens INTEGER DEFAULT 0,
status TEXT DEFAULT 'ok', -- 'ok' or error code
requested_at TEXT NOT NULL, -- ISO-8601 UTC
notes TEXT -- user-editable note
);
Usage Query (Admin)¶
Returns JSON with all entries from the last N days, including key label and key prefix.
Summary: Life of a Token¶
flowchart TD
S1[1. User sends API request] --> S2["2. Orchestrator validates API key<br/>Valkey, TTL 5min"]
S2 --> S3{"3. Budget check<br/>user:id:tokens:daily ≤ daily_limit?"}
S3 -->|Exceeded| R429([HTTP 429])
S3 -->|OK| S4[4. LLM inference runs]
S4 --> S5[5. Token counting:<br/>prompt_tokens + completion_tokens]
S5 --> S6["6. Retrieve cost factor<br/>Valkey user:id:cost_factor"]
S6 --> S7["7. Effective tokens = raw × cost_factor"]
S7 --> S8["8. Increment Valkey counters<br/>daily + monthly + total"]
S8 --> S9["9. usage_log → PostgreSQL<br/>fire & forget, async"]
S9 --> S10["10. Increment Prometheus counter<br/>moe_tokens_total"]