Configuration Reference
Cordum uses environment variables and YAML config files to configure every service. Config files are validated against embedded JSON schemas at startup — invalid configs return errors and fall back to defaults.
Environment Variables
Shared (all services)
| Variable | Description | Default |
|---|---|---|
| NATS_URL | NATS server connection URL | nats://nats:4222 |
| REDIS_URL | Redis connection URL | redis://redis:6379 |
| NATS_USE_JETSTREAM | Enable JetStream durable delivery (0 or 1) | 0 |
| CORDUM_ENV | Set to production for strict security defaults | — |
| CORDUM_PRODUCTION | Alternative production flag (true/false) | — |
| CORDUM_LOG_FORMAT | Log output format | text |
| CORDUM_TLS_MIN_VERSION | Minimum TLS version (1.2 or 1.3) | 1.3 in prod |
| CORDUM_GRPC_REFLECTION | Enable gRPC reflection (dev only, set to 1) | — |
| POOL_CONFIG_PATH | Path to pools.yaml | — |
| TIMEOUT_CONFIG_PATH | Path to timeouts.yaml | — |
| SAFETY_KERNEL_ADDR | gRPC address of the Safety Kernel | — |
TLS & Redis Clustering
| Variable | Description | Default |
|---|---|---|
| NATS_TLS_CA / CERT / KEY | NATS client TLS certificate chain | — |
| NATS_TLS_INSECURE | Skip NATS TLS verification (dev only) | — |
| NATS_TLS_SERVER_NAME | Expected NATS server CN for verification | — |
| REDIS_TLS_CA / CERT / KEY | Redis client TLS certificate chain | — |
| REDIS_TLS_INSECURE | Skip Redis TLS verification (dev only) | — |
| REDIS_CLUSTER_ADDRESSES | Comma-separated host:port seeds for Redis Cluster | — |
API Gateway
| Variable | Description | Default |
|---|---|---|
| GATEWAY_HTTP_ADDR | HTTP listen address | :8081 |
| GATEWAY_GRPC_ADDR | gRPC listen address | :8080 |
| GATEWAY_WS_METRICS_ADDR | WebSocket and Prometheus metrics address | :9092 |
| CORDUM_API_KEY | Single API key for authentication | — |
| CORDUM_API_KEYS | Comma-separated or JSON array of API keys | — |
| CORDUM_API_KEYS_PATH | File path to API keys (hot-reloads on change) | — |
| CORDUM_ALLOWED_ORIGINS | CORS allowed origins (also CORDUM_CORS_ALLOW_ORIGINS) | — |
| TENANT_ID | Default tenant ID for single-tenant mode | — |
| API_RATE_LIMIT_RPS | Rate limit requests per second (per tenant) | — |
| API_RATE_LIMIT_BURST | Rate limit burst size (per tenant) | — |
| ARTIFACT_MAX_BYTES | Max artifact upload/download size | — |
| CORDUM_ALLOW_INSECURE_NO_AUTH | Allow anonymous auth (dev only, set to 1) | — |
| CORDUM_ALLOW_HEADER_PRINCIPAL | Trust X-Principal header (disabled in prod) | — |
| Variable | Description | Default |
|---|---|---|
| GATEWAY_HTTP_TLS_CERT / KEY | HTTP server TLS certificate and key | — |
| GRPC_TLS_CERT / KEY | gRPC server TLS certificate and key | — |
| Variable | Description | Default |
|---|---|---|
| CORDUM_JWT_HMAC_SECRET | HMAC secret for JWT verification | — |
| CORDUM_JWT_PUBLIC_KEY | RSA/ECDSA public key (inline PEM) | — |
| CORDUM_JWT_PUBLIC_KEY_PATH | Path to public key file | — |
| CORDUM_JWT_ISSUER | Expected JWT issuer claim | — |
| CORDUM_JWT_AUDIENCE | Expected JWT audience claim | — |
| CORDUM_JWT_DEFAULT_ROLE | Default role for JWT-authenticated users | — |
| CORDUM_JWT_REQUIRED | Require JWT auth on all requests | — |
| Variable | Description | Default |
|---|---|---|
| CORDUM_USER_AUTH_ENABLED | Enable user/password auth (stores users in Redis) | — |
| CORDUM_ADMIN_USERNAME | Default admin username | admin |
| CORDUM_ADMIN_PASSWORD | Default admin password (created on first startup) | — |
| CORDUM_ADMIN_EMAIL | Optional admin email | — |
Scheduler
| Variable | Description | Default |
|---|---|---|
| JOB_META_TTL | TTL for job metadata in Redis (also JOB_META_TTL_SECONDS) | — |
| WORKER_SNAPSHOT_INTERVAL | Interval for worker snapshot updates | — |
| SCHEDULER_CONFIG_RELOAD_INTERVAL | Config overlay reload interval | 30s |
| NATS_JS_ACK_WAIT | JetStream acknowledgment wait timeout | — |
| NATS_JS_MAX_AGE | JetStream message max age | — |
| NATS_JS_REPLICAS | JetStream stream replication factor | — |
| SCHEDULER_METRICS_ADDR | Prometheus metrics address | :9090 |
Safety Kernel
| Variable | Description | Default |
|---|---|---|
| SAFETY_KERNEL_ADDR | gRPC listen address for the Safety Kernel | — |
| SAFETY_POLICY_PATH | Path to safety.yaml policy file | — |
| SAFETY_POLICY_URL | URL to fetch policy (alternative to file path) | — |
| SAFETY_POLICY_URL_ALLOWLIST | Comma-separated allowed hostnames for policy URLs | — |
| SAFETY_DECISION_CACHE_TTL | Decision cache TTL | 0 (disabled) |
| SAFETY_POLICY_RELOAD_INTERVAL | Interval for policy file hot reload | — |
| SAFETY_POLICY_PUBLIC_KEY | Ed25519 public key for policy signature verification | — |
| SAFETY_POLICY_SIGNATURE_REQUIRED | Require signed policy files (true/false) | — |
| SAFETY_POLICY_CONFIG_SCOPE | Config-service scope for policy fragments | — |
| SAFETY_POLICY_CONFIG_DISABLE | Disable config-service policy overlays | — |
| Variable | Description | Default |
|---|---|---|
| SAFETY_KERNEL_TLS_CERT / KEY | gRPC server TLS certificate and key | — |
| SAFETY_KERNEL_TLS_CA | Client CA for mTLS verification | — |
| SAFETY_KERNEL_TLS_REQUIRED | Require TLS for all connections | — |
The Safety Kernel reads policy bundle fragments from the config service in Redis. Ensure REDIS_URL is set when using pack policy overlays.
Workflow Engine
| Variable | Description | Default |
|---|---|---|
| WORKFLOW_ENGINE_HTTP_ADDR | HTTP listen address for the workflow engine | — |
| WORKFLOW_ENGINE_SCAN_INTERVAL | Interval for scanning pending workflow runs | — |
| WORKFLOW_ENGINE_RUN_SCAN_LIMIT | Max runs to process per scan cycle | — |
Context Engine
| Variable | Description | Default |
|---|---|---|
| CONTEXT_ENGINE_ADDR | gRPC listen address | — |
| CONTEXT_ENGINE_TLS_CERT / KEY | gRPC server TLS certificate and key | — |
| CONTEXT_ENGINE_TLS_CA | Client CA for mTLS | — |
| CONTEXT_ENGINE_TLS_REQUIRED | Require TLS for all connections | — |
Enterprise
| Variable | Description | Default |
|---|---|---|
| CORDUM_LICENSE_PATH | Path to signed license file | — |
| CORDUM_LICENSE_KEY | Signed license key/token | — |
| CORDUM_REQUIRE_RBAC | Enable role checks on policy/config/approvals/packs | — |
Enterprise auth/licensing features are delivered by the separatecordum-enterpriserepository.
Config Files
Docker Compose mounts these files from config/. The control plane validates them against embedded JSON schemas at startup. Invalid configs return errors; timeouts fall back to defaults.
pools.yaml — Topic-to-Pool Routing
Maps job topics to worker pools. Each pool can declare requires — capability labels a worker must advertise to join the pool. The scheduler uses this mapping plus least-loaded strategy to dispatch jobs.
topics:
job.default: default
job.sre-investigator.collect.k8s: sre-investigators
job.sre-investigator.collect.logs: sre-investigators
job.deploy-agents.apply: deploy-agents
job.compliance-agents.process: compliance-agents
pools:
default:
requires: []
sre-investigators:
requires: [k8s, logs]
deploy-agents:
requires: [deploy]
compliance-agents:
requires: []safety.yaml — Safety Kernel Policy
Per-tenant policy configuration for the Safety Kernel. Controls topic allow/deny lists, repository host restrictions, and MCP (Model Context Protocol) server/tool/resource filtering.
default_tenant: default
tenants:
default:
allow_topics:
- "job.*"
deny_topics:
- "sys.*"
allowed_repo_hosts: []
denied_repo_hosts: []
mcp:
allow_servers: []
deny_servers: []
allow_tools: []
deny_tools: []
allow_resources: []
deny_resources: []
allow_actions: []
deny_actions: []system.yaml — System-Wide Configuration
Sample payload for the config service. Not mounted by default — use POST /api/v1/config to store it. Controls budgets, rate limits, retry policy, resource limits, model access, context windows, SLOs, and integrations.
safety: pii_detection_enabled: true pii_action: "block" injection_detection: true injection_sensitivity: "high" content_filter_enabled: true budget: daily_limit_usd: 1000.0 monthly_limit_usd: 10000.0 per_job_max_usd: 5.0 per_workflow_max_usd: 50.0 alert_at_percent: [50, 75, 90, 100] action_at_limit: "throttle" rate_limits: requests_per_minute: 120000 concurrent_jobs: 10000 concurrent_workflows: 5 queue_size: 5000 retry: max_retries: 3 initial_backoff: 1s max_backoff: 30s backoff_multiplier: 2.0 resources: default_priority: "interactive" max_timeout_seconds: 300 default_timeout_seconds: 60 max_parallel_steps: 10 models: allowed_models: ["gpt-4", "llama-3", "claude-3"] default_model: "gpt-4" fallback_models: ["llama-3"] context: max_context_tokens: 4000 max_retrieved_chunks: 10 cross_tenant_access: false slo: target_p95_latency_ms: 1000 error_rate_budget: 0.01 timeout_seconds: 60
Daily/monthly/per-job/per-workflow USD limits with alert thresholds and throttle action
RPM, burst, concurrent jobs/workflows, and queue depth limits per tenant
Max retries, exponential backoff (initial, max, multiplier), retryable error classes
Priority, timeouts, max parallel steps, preemption settings
Allowed/default/fallback model lists for LLM-backed jobs
Token limits, chunk retrieval, cross-tenant access, allowed connectors
timeouts.yaml — Timeout Configuration
Per-topic and per-workflow timeout overrides. The reconciler uses these values to mark stale DISPATCHED and RUNNING jobs as TIMEOUT.
# Per-workflow timeouts (keyed by workflow ID)
workflows: {}
# Per-topic timeouts (keyed by topic pattern)
topics: {}
# Reconciler settings
reconciler:
dispatch_timeout_seconds: 300 # 5 min for DISPATCHED → TIMEOUT
running_timeout_seconds: 9000 # 2.5 hrs for RUNNING → TIMEOUT
scan_interval_seconds: 30 # How often reconciler scansnats.conf — NATS Server Configuration
NATS server configuration for JetStream durability, cluster settings, and authorization. Mounted into the NATS container via Docker Compose or Kubernetes ConfigMap.
listen: 0.0.0.0:4222
jetstream {
store_dir: /data/jetstream
max_mem: 1G
max_file: 10G
sync_interval: "1s" # fsync cadence (lower = safer, slower)
}
# Optional cluster config
# cluster {
# listen: 0.0.0.0:6222
# routes: [nats-route://nats-1:6222, nats-route://nats-2:6222]
# }Config Scopes & Merging
Configuration follows a scope hierarchy. More specific scopes override broader ones using shallow merge — later values replace earlier values at the top level.
cfg:system:<id>Platform-wide defaults (budgets, rate limits, models)
cfg:org:<id>Organization-level overrides
cfg:team:<id>Team-level overrides
cfg:workflow:<id>Per-workflow overrides
cfg:step:<id>Per-step overrides (most specific)
Merge Behavior
- Shallow merge: more-specific scope keys replace broader scope keys
- Arrays are replaced, not appended
- Missing scopes are skipped — only defined scopes participate
- Final merged result is the "effective config"
REST API (Config)
# Get config at a scope (envelope mode)
curl "http://localhost:8081/api/v1/config?scope=system&scope_id=default&envelope=true" \
-H "X-API-Key: $CORDUM_API_KEY" \
-H "X-Tenant-ID: default"
# Set config at a scope
curl -X POST http://localhost:8081/api/v1/config \
-H "X-API-Key: $CORDUM_API_KEY" \
-H "X-Tenant-ID: default" \
-H "Content-Type: application/json" \
-d '{"scope":"system","scope_id":"default","data":{"timeouts":{"job_timeout_sec":300}}}'
# View merged effective config
curl "http://localhost:8081/api/v1/config/effective?workflow_id=sre.triage" \
-H "X-API-Key: $CORDUM_API_KEY" \
-H "X-Tenant-ID: default"Hot Reload
Config files and policy are hot-reloaded without service restart. Each revision is tracked with a SHA256 hash for cache validation and rollback.
Config Overlays
The scheduler reloads config overlays from Redis at SCHEDULER_CONFIG_RELOAD_INTERVAL (default 30s). Changes to pool routing, timeouts, and system config take effect without restart.
Safety Policy
The Safety Kernel watches the policy file at SAFETY_POLICY_RELOAD_INTERVAL and reloads policy bundle fragments from the config service. Each reload generates a new snapshot with SHA256 hash.
API Keys
When using CORDUM_API_KEYS_PATH, the gateway watches the file and reloads keys on change. No restart required for key rotation.
NATS JetStream Durability
JetStream fsync cadence is controlled by sync_interval in the NATS server config. Lower values improve crash durability at the cost of throughput.
Edit config/nats.conf
Edit cordum-nats-config ConfigMap in deploy/k8s/base.yaml
Edit ConfigMap in deploy/k8s/production/nats.yaml
Set nats.jetstream.syncInterval in values.yaml
