Skip to content
Documentation

Operations

Operationally, the most important signals are gateway status, service health, compose/container logs, and job lifecycle outcomes in DLQ and approvals.

Health endpoints
  • GET /health on the gateway for basic liveness.
  • GET /api/v1/status for gateway + NATS + Redis connectivity.
  • Gateway metrics on :9092/metrics by default.
  • Workflow engine health on WORKFLOW_ENGINE_HTTP_ADDR (default :9093).
  • Scheduler metrics on SCHEDULER_METRICS_ADDR (default :9090).
Status checks
cordumctl status
curl -sS https://localhost:8081/api/v1/status   --cacert ./certs/ca/ca.crt   -H "X-API-Key: $CORDUM_API_KEY"   -H "X-Tenant-ID: default" | jq

Logs and recovery

Compose logs
docker compose ps
docker compose logs -f api-gateway scheduler safety-kernel workflow-engine dashboard

docker compose logs --tail=200 redis
docker compose logs --tail=200 nats
Scheduler notes
  • The scheduler reconciles stale jobs and marks timed-out work based on current timeout configuration.
  • Config reloads use NATS notifications and Redis polling, so pool and timeout changes propagate without restarting every replica.
  • DLQ inspection and retry use GET /api/v1/dlq and POST /api/v1/dlq/{job_id}/retry.

Reset local state

Destructive local reset
docker compose exec redis redis-cli FLUSHALL
docker compose down -v
Warning

This wipes Redis state and the Compose volumes used by NATS and Redis. Use it only for local development and reproducible test resets.