Skip to content
Documentation

Deployment Guide

Deploy Cordum with Docker Compose for development, Kubernetes manifests for production, or Helm charts for managed environments. This guide covers service topology, production hardening, scaling, and backup strategies.

Docker Compose

Local development. Single machine, all services, hot reload. Start in under a minute.

docker compose up -d

Kubernetes

Production deployment with kustomize overlays. TLS, HA, monitoring, network policies.

kubectl apply -k deploy/k8s/production

Helm Chart

Managed deployment with configurable values. External NATS/Redis support.

helm install cordum ./cordum-helm

Service Topology

Port Matrix
ServicegRPCHTTPMetricsOther
NATS——82224222 (client), 6222 (cluster)
Redis——9121 (exporter)6379
API Gateway808080819092—
Scheduler——9090—
Safety Kernel50051———
Workflow Engine—9093——
Context Engine50070———
Dashboard—8082——

Docker Compose

The default Docker Compose setup starts all 8 services with health checks, dependency ordering, and volume persistence. All config files are mounted from the config/ directory.

Quick Start
# Set API key (required)
export CORDUM_API_KEY="$(openssl rand -hex 32)"

# Start all services
docker compose up -d

# Verify health
docker compose ps

# Smoke test
bash ./tools/scripts/platform_smoke.sh
Key Environment Variables
CORDUM_API_KEY— Required. Fails startup if missing.
NATS_USE_JETSTREAM=1— Durable delivery for scheduler, gateway, workflow engine.
API_RATE_LIMIT_RPS=2000— Per-tenant rate limit.
JOB_META_TTL=168h— Job metadata retention (7 days).
REDIS_DATA_TTL=24h— Context/result pointer TTL.
Config File Mounts
config/nats.conf→ /etc/nats/nats.conf (NATS)
config/pools.yaml→ /etc/cordum/pools.yaml (Scheduler)
config/timeouts.yaml→ /etc/cordum/timeouts.yaml (Scheduler)
config/safety.yaml→ /etc/cordum/safety.yaml (Safety Kernel)

Volumes: nats_data and redis_data for persistence across restarts.

Health checks: All services use 10s interval, 3s timeout, 3 retries, 10s start period. NATS/Redis use TCP socket checks. Control plane services use HTTP/gRPC health endpoints. Services start in dependency order with depends_on: condition: service_healthy.

Kubernetes

Kubernetes manifests use a base + overlay pattern via Kustomize. The base deploys all services as Deployments. The production overlay replaces NATS and Redis with StatefulSets, adds TLS, HA, monitoring, network policies, ingress, and backup CronJobs.

Base (Development)
# Deploy base manifests
kubectl apply -f deploy/k8s/base.yaml

# Namespace: cordum
# All services as Deployments
# emptyDir volumes (no persistence)
Production Overlay
# Deploy production overlay
kubectl apply -k deploy/k8s/production

# Adds: StatefulSet NATS (3 replicas)
#        StatefulSet Redis (6-node cluster)
#        TLS/mTLS for all services
#        NetworkPolicies
#        HPA + PodDisruptionBudgets
#        ServiceMonitors + PrometheusRules
#        Ingress (edit host first)
#        Backup CronJobs (hourly)
Resource Requests / Limits
ServiceCPU (req / limit)Memory (req / limit)Replicas
API Gateway200m / 1000m256Mi / 1Gi2 (HPA: 2–10)
Scheduler150m / 750m256Mi / 768Mi2 (HPA: 2–10)
Safety Kernel100m / 500m128Mi / 512Mi2
Workflow Engine150m / 750m256Mi / 768Mi1
Context Engine100m / 500m128Mi / 512Mi1
Dashboard100m / 500m128Mi / 512Mi1
NATS200m / 1000m256Mi / 1Gi3 (StatefulSet)
Redis200m / 1000m256Mi / 1Gi6 (Cluster)

ConfigMaps

  • cordum-pools — topic-to-pool routing
  • cordum-timeouts — reconciler and per-topic timeouts
  • cordum-safety — safety kernel policy YAML
  • cordum-nats-config — NATS server configuration

Required Secrets

  • cordum-api-key — API authentication key
  • cordum-nats-server-tls — NATS server cert/key/CA
  • cordum-redis-server-tls — Redis server cert/key/CA
  • cordum-client-tls — shared client cert for services

Helm Charts

The Cordum Helm chart (v0.1.4) packages all services with configurable values. Supports external NATS/Redis, custom resource limits, ingress, and per-service toggle.

Install Commands
# Local chart
helm install cordum ./cordum-helm \
  -n cordum --create-namespace \
  --set secrets.apiKey=<your-api-key>

# Published chart
helm repo add cordum https://charts.cordum.io
helm install cordum cordum/cordum \
  -n cordum --create-namespace

# External NATS/Redis (no embedded services)
helm install cordum ./cordum-helm \
  -n cordum --create-namespace \
  --set nats.enabled=false \
  --set redis.enabled=false \
  --set external.natsUrl=nats://nats.example.com:4222 \
  --set external.redisUrl=redis://redis.example.com:6379

# Upgrade
helm upgrade cordum ./cordum-helm -n cordum

# Port-forward for local access
kubectl -n cordum port-forward svc/cordum-api-gateway 8081:8081
kubectl -n cordum port-forward svc/cordum-dashboard 8082:8080
Key values.yaml Settings
global.image.tag — image version (default: v0.1.4)
secrets.apiKey — API key (required)
nats.enabled — deploy embedded NATS (default: true)
redis.enabled — deploy embedded Redis (default: true)
nats.persistence.enabled — PVC for JetStream data
ingress.enabled — create Ingress resource
gateway.replicaCount — gateway replicas (default: 1)
gateway.env.userAuthEnabled — enable user/password auth
external.natsUrl — external NATS connection

Production Hardening

The production Kustomize overlay applies TLS, network isolation, HA, monitoring, and backup CronJobs. Review each item before deploying to production.

TLS / mTLS

Encrypt all in-cluster communication. NATS uses tls:// protocol with cert/key/CA. Redis uses rediss:// with tls-auth-clients. All services mount client TLS certs.

API Key Management

CORDUM_API_KEY is required in production (fails startup if missing). Rotate keys via CORDUM_API_KEYS_PATH with hot reload. Store in K8s Secrets or Vault.

Policy Signing

Set SAFETY_POLICY_PUBLIC_KEY for Ed25519 signature verification. Enable SAFETY_POLICY_SIGNATURE_REQUIRED to reject unsigned policy files.

Network Policies

Production overlay includes NetworkPolicies restricting ingress/egress per service. Dashboard can only reach API Gateway. Metrics bind to loopback by default.

Pod Security

All services run as non-root (uid: 65532), read-only filesystem, no privilege escalation. seccompProfile: RuntimeDefault.

Monitoring & Alerts

Prometheus ServiceMonitors for all services (30s scrape). PrometheusRules alert on service downtime (5min threshold, critical severity).

Backup Strategy

Hourly CronJobs: Redis RDB snapshots + NATS stream snapshots to 20Gi PVC. Store off-cluster for disaster recovery.

RBAC Enforcement

Set CORDUM_REQUIRE_RBAC=true with enterprise license. Gates policy, config, approvals, and pack operations to authorized roles.

TLS Environment Variables (Production)
# All services
NATS_URL=tls://nats:4222
NATS_TLS_CA=/etc/cordum/tls/client/ca.crt
NATS_TLS_CERT=/etc/cordum/tls/client/tls.crt
NATS_TLS_KEY=/etc/cordum/tls/client/tls.key

REDIS_URL=rediss://redis:6379
REDIS_TLS_CA=/etc/cordum/tls/client/ca.crt
REDIS_TLS_CERT=/etc/cordum/tls/client/tls.crt
REDIS_TLS_KEY=/etc/cordum/tls/client/tls.key

# JetStream replication
NATS_JS_REPLICAS=3

# Redis Cluster
REDIS_CLUSTER_ADDRESSES=cordum-redis-{0..5}:6379

Before deploying production: Edit the Ingress host in deploy/k8s/production/ingress.yaml, create all TLS secrets, and set CORDUM_API_KEY in the cordum-api-key Secret.

Scaling Considerations

ServiceStrategy
API GatewayHPA: 2–10 replicas (70% CPU / 80% memory). Stateless, scales linearly.
SchedulerHPA: 2–10 replicas. Leader election via Redis locks — single active, others standby.
Safety Kernel2+ replicas. Stateless, round-robin load balanced.
Workflow EngineSingle replica recommended (stateful, requires coordination).
NATS3-node cluster (quorum: 2/3). JetStream replicas match cluster size (NATS_JS_REPLICAS=3).
Redis6-node cluster (3 primary + 3 replica). Add shards by scaling StatefulSet + rebalance.

Persistence

  • NATS JetStream: file-based persistence, 1s fsync
  • Redis: AOF (appendonly yes) + RDB snapshots
  • 20Gi PVCs per StatefulSet pod in production

High Availability

  • PodDisruptionBudgets: maxUnavailable=1 per service
  • Pod anti-affinity spreads replicas across nodes
  • NATS 3-node quorum, Redis 3+3 cluster

Backups

  • Redis: hourly RDB snapshots via CronJob
  • NATS: hourly stream snapshots (CORDUM_SYS, CORDUM_JOBS)
  • 20Gi backup PVC, copy off-cluster for DR

Single Binary Mode

For local development and quick demos, cordumctl up starts all core services in a single process with embedded NATS and Redis. No Docker required.

Use for

Local development, quick demos, CI/CD smoke tests

Don't use for

Production, multi-node, HA, or persistent workloads

Includes

Gateway, Scheduler, Safety Kernel, Workflow Engine, Context Engine, Dashboard

cordumctl up
# Start all services (single binary)
cordumctl up

# Access
# API:       http://localhost:8081
# Dashboard: http://localhost:8082

# Stop
cordumctl down