Deployment Guide
Deploy Cordum with Docker Compose for development, Kubernetes manifests for production, or Helm charts for managed environments. This guide covers service topology, production hardening, scaling, and backup strategies.
Docker Compose
Local development. Single machine, all services, hot reload. Start in under a minute.
docker compose up -dKubernetes
Production deployment with kustomize overlays. TLS, HA, monitoring, network policies.
kubectl apply -k deploy/k8s/productionHelm Chart
Managed deployment with configurable values. External NATS/Redis support.
helm install cordum ./cordum-helmService Topology
| Service | gRPC | HTTP | Metrics | Other |
|---|---|---|---|---|
| NATS | — | — | 8222 | 4222 (client), 6222 (cluster) |
| Redis | — | — | 9121 (exporter) | 6379 |
| API Gateway | 8080 | 8081 | 9092 | — |
| Scheduler | — | — | 9090 | — |
| Safety Kernel | 50051 | — | — | — |
| Workflow Engine | — | 9093 | — | — |
| Context Engine | 50070 | — | — | — |
| Dashboard | — | 8082 | — | — |
Docker Compose
The default Docker Compose setup starts all 8 services with health checks, dependency ordering, and volume persistence. All config files are mounted from the config/ directory.
# Set API key (required) export CORDUM_API_KEY="$(openssl rand -hex 32)" # Start all services docker compose up -d # Verify health docker compose ps # Smoke test bash ./tools/scripts/platform_smoke.sh
CORDUM_API_KEY— Required. Fails startup if missing.NATS_USE_JETSTREAM=1— Durable delivery for scheduler, gateway, workflow engine.API_RATE_LIMIT_RPS=2000— Per-tenant rate limit.JOB_META_TTL=168h— Job metadata retention (7 days).REDIS_DATA_TTL=24h— Context/result pointer TTL.config/nats.conf→ /etc/nats/nats.conf (NATS)config/pools.yaml→ /etc/cordum/pools.yaml (Scheduler)config/timeouts.yaml→ /etc/cordum/timeouts.yaml (Scheduler)config/safety.yaml→ /etc/cordum/safety.yaml (Safety Kernel)Volumes: nats_data and redis_data for persistence across restarts.
depends_on: condition: service_healthy.Kubernetes
Kubernetes manifests use a base + overlay pattern via Kustomize. The base deploys all services as Deployments. The production overlay replaces NATS and Redis with StatefulSets, adds TLS, HA, monitoring, network policies, ingress, and backup CronJobs.
# Deploy base manifests kubectl apply -f deploy/k8s/base.yaml # Namespace: cordum # All services as Deployments # emptyDir volumes (no persistence)
# Deploy production overlay kubectl apply -k deploy/k8s/production # Adds: StatefulSet NATS (3 replicas) # StatefulSet Redis (6-node cluster) # TLS/mTLS for all services # NetworkPolicies # HPA + PodDisruptionBudgets # ServiceMonitors + PrometheusRules # Ingress (edit host first) # Backup CronJobs (hourly)
| Service | CPU (req / limit) | Memory (req / limit) | Replicas |
|---|---|---|---|
| API Gateway | 200m / 1000m | 256Mi / 1Gi | 2 (HPA: 2–10) |
| Scheduler | 150m / 750m | 256Mi / 768Mi | 2 (HPA: 2–10) |
| Safety Kernel | 100m / 500m | 128Mi / 512Mi | 2 |
| Workflow Engine | 150m / 750m | 256Mi / 768Mi | 1 |
| Context Engine | 100m / 500m | 128Mi / 512Mi | 1 |
| Dashboard | 100m / 500m | 128Mi / 512Mi | 1 |
| NATS | 200m / 1000m | 256Mi / 1Gi | 3 (StatefulSet) |
| Redis | 200m / 1000m | 256Mi / 1Gi | 6 (Cluster) |
ConfigMaps
cordum-pools— topic-to-pool routingcordum-timeouts— reconciler and per-topic timeoutscordum-safety— safety kernel policy YAMLcordum-nats-config— NATS server configuration
Required Secrets
cordum-api-key— API authentication keycordum-nats-server-tls— NATS server cert/key/CAcordum-redis-server-tls— Redis server cert/key/CAcordum-client-tls— shared client cert for services
Helm Charts
The Cordum Helm chart (v0.1.4) packages all services with configurable values. Supports external NATS/Redis, custom resource limits, ingress, and per-service toggle.
# Local chart helm install cordum ./cordum-helm \ -n cordum --create-namespace \ --set secrets.apiKey=<your-api-key> # Published chart helm repo add cordum https://charts.cordum.io helm install cordum cordum/cordum \ -n cordum --create-namespace # External NATS/Redis (no embedded services) helm install cordum ./cordum-helm \ -n cordum --create-namespace \ --set nats.enabled=false \ --set redis.enabled=false \ --set external.natsUrl=nats://nats.example.com:4222 \ --set external.redisUrl=redis://redis.example.com:6379 # Upgrade helm upgrade cordum ./cordum-helm -n cordum # Port-forward for local access kubectl -n cordum port-forward svc/cordum-api-gateway 8081:8081 kubectl -n cordum port-forward svc/cordum-dashboard 8082:8080
Production Hardening
The production Kustomize overlay applies TLS, network isolation, HA, monitoring, and backup CronJobs. Review each item before deploying to production.
TLS / mTLS
Encrypt all in-cluster communication. NATS uses tls:// protocol with cert/key/CA. Redis uses rediss:// with tls-auth-clients. All services mount client TLS certs.
API Key Management
CORDUM_API_KEY is required in production (fails startup if missing). Rotate keys via CORDUM_API_KEYS_PATH with hot reload. Store in K8s Secrets or Vault.
Policy Signing
Set SAFETY_POLICY_PUBLIC_KEY for Ed25519 signature verification. Enable SAFETY_POLICY_SIGNATURE_REQUIRED to reject unsigned policy files.
Network Policies
Production overlay includes NetworkPolicies restricting ingress/egress per service. Dashboard can only reach API Gateway. Metrics bind to loopback by default.
Pod Security
All services run as non-root (uid: 65532), read-only filesystem, no privilege escalation. seccompProfile: RuntimeDefault.
Monitoring & Alerts
Prometheus ServiceMonitors for all services (30s scrape). PrometheusRules alert on service downtime (5min threshold, critical severity).
Backup Strategy
Hourly CronJobs: Redis RDB snapshots + NATS stream snapshots to 20Gi PVC. Store off-cluster for disaster recovery.
RBAC Enforcement
Set CORDUM_REQUIRE_RBAC=true with enterprise license. Gates policy, config, approvals, and pack operations to authorized roles.
# All services
NATS_URL=tls://nats:4222
NATS_TLS_CA=/etc/cordum/tls/client/ca.crt
NATS_TLS_CERT=/etc/cordum/tls/client/tls.crt
NATS_TLS_KEY=/etc/cordum/tls/client/tls.key
REDIS_URL=rediss://redis:6379
REDIS_TLS_CA=/etc/cordum/tls/client/ca.crt
REDIS_TLS_CERT=/etc/cordum/tls/client/tls.crt
REDIS_TLS_KEY=/etc/cordum/tls/client/tls.key
# JetStream replication
NATS_JS_REPLICAS=3
# Redis Cluster
REDIS_CLUSTER_ADDRESSES=cordum-redis-{0..5}:6379Before deploying production: Edit the Ingress host in deploy/k8s/production/ingress.yaml, create all TLS secrets, and set CORDUM_API_KEY in the cordum-api-key Secret.
Scaling Considerations
| Service | Strategy |
|---|---|
| API Gateway | HPA: 2–10 replicas (70% CPU / 80% memory). Stateless, scales linearly. |
| Scheduler | HPA: 2–10 replicas. Leader election via Redis locks — single active, others standby. |
| Safety Kernel | 2+ replicas. Stateless, round-robin load balanced. |
| Workflow Engine | Single replica recommended (stateful, requires coordination). |
| NATS | 3-node cluster (quorum: 2/3). JetStream replicas match cluster size (NATS_JS_REPLICAS=3). |
| Redis | 6-node cluster (3 primary + 3 replica). Add shards by scaling StatefulSet + rebalance. |
Persistence
- NATS JetStream: file-based persistence, 1s fsync
- Redis: AOF (appendonly yes) + RDB snapshots
- 20Gi PVCs per StatefulSet pod in production
High Availability
- PodDisruptionBudgets: maxUnavailable=1 per service
- Pod anti-affinity spreads replicas across nodes
- NATS 3-node quorum, Redis 3+3 cluster
Backups
- Redis: hourly RDB snapshots via CronJob
- NATS: hourly stream snapshots (CORDUM_SYS, CORDUM_JOBS)
- 20Gi backup PVC, copy off-cluster for DR
Single Binary Mode
For local development and quick demos, cordumctl up starts all core services in a single process with embedded NATS and Redis. No Docker required.
Local development, quick demos, CI/CD smoke tests
Production, multi-node, HA, or persistent workloads
Gateway, Scheduler, Safety Kernel, Workflow Engine, Context Engine, Dashboard
# Start all services (single binary) cordumctl up # Access # API: http://localhost:8081 # Dashboard: http://localhost:8082 # Stop cordumctl down
