Skip to content
Deep Dive

AI Agent Safety Kernel TLS Hardening

A denied request over plaintext is still a control failure.

Deep Dive10 min readMar 2026
TL;DR
  • -Safety checks are only as trustworthy as the transport carrying the request and decision.
  • -Cordum forces Safety Kernel server TLS in production and requires client CA material when TLS is required.
  • -Client fallback to insecure transport is limited to non-production or explicit opt-in paths.
  • -Default production minimum TLS version is 1.3 unless you override `CORDUM_TLS_MIN_VERSION`.
Production default

Server startup fails in production if `SAFETY_KERNEL_TLS_CERT` is missing.

Downgrade control

Client transport logic blocks silent plaintext unless explicitly allowed outside strict mode.

Cert lifecycle

Server certificate reload loop checks keypair updates every 30 seconds.

Scope

This guide covers transport security between scheduler/gateway clients and Safety Kernel gRPC services. It does not cover browser TLS, ingress TLS, or generic mesh architecture.

The production problem

Safety checks sit on the hot path before dispatch. If that path can downgrade to plaintext transport, you have a control-plane blind spot.

An attacker who can intercept or reroute traffic between scheduler and Safety Kernel does not need a fancy model jailbreak. They need a network foothold and a weak transport policy.

This is why transport behavior has to be explicit at both ends: server listener posture and client credential selection.

What top results miss

SourceStrong coverageMissing piece
gRPC Authentication GuideTLS and mTLS foundations for gRPC channels and auth primitives.No environment-driven downgrade matrix for mixed-mode production control planes.
Istio PeerAuthentication ReferenceSTRICT vs PERMISSIVE mTLS policy posture in service meshes.No app-level fallback behavior for direct gRPC clients outside mesh policy enforcement.
SPIRE Use CasesRuntime workload identity with short-lived, automatically rotated mTLS credentials.No direct mapping to Safety Kernel request paths and scheduler fail-mode interactions.

The operational gap is downgrade handling: what exactly happens when certs are missing, CA files are absent, or insecure flags appear in production configs.

Downgrade risk model

StateRiskGuardrail
Server has no cert/key in productionKernel starts plaintext or misconfiguredFail startup when `SAFETY_KERNEL_TLS_CERT` is not set in production
Client has no CA and TLS requiredSilent insecure dial to kernelReturn explicit error: `safety_kernel_tls_ca required`
Operator sets insecure flag in productionBypass intended transport guaranteesProduction mode still requires TLS paths unless strict requirements are changed
Outdated TLS protocol floorWeaker transport propertiesUse default production TLS 1.3 floor or enforce `CORDUM_TLS_MIN_VERSION=1.3`

Cordum runtime behavior

BoundaryCurrent behaviorOperational impact
Server keypair requirementIf one of `SAFETY_KERNEL_TLS_CERT`/`SAFETY_KERNEL_TLS_KEY` is missing, startup fails.Prevents half-configured TLS from entering runtime.
Server production postureProduction mode rejects startup without TLS certificate configuration.No plaintext Safety Kernel server in production by default.
Cert reloadTLS keypair reloader watch loop runs every 30 seconds.Supports cert rotation without full redeploy.
Client TLS requirementClient requires CA when in production or when `SAFETY_KERNEL_TLS_REQUIRED=true`.Blocks accidental insecure dials when strict mode is expected.
Client insecure fallbackInsecure transport allowed only when TLS is not required and environment allows it.Keeps local/dev workflow possible while preserving production safety defaults.
TLS protocol floor`CORDUM_TLS_MIN_VERSION` controls minimum; production default resolves to TLS 1.3.Avoids stale protocol baselines in production control planes.

Implementation examples

Client transport selection (Go)

safety_transport_credentials.go
Go
func safetyTransportCredentials() (credentials.TransportCredentials, error) {
  caPath := strings.TrimSpace(os.Getenv("SAFETY_KERNEL_TLS_CA"))
  requireTLS := env.IsProduction() || env.Bool("SAFETY_KERNEL_TLS_REQUIRED")
  insecureAllowed := env.Bool("SAFETY_KERNEL_INSECURE")

  if caPath == "" {
    if requireTLS {
      return nil, fmt.Errorf("safety_kernel_tls_ca required")
    }
    if insecureAllowed || !env.IsProduction() {
      return insecure.NewCredentials(), nil
    }
    return nil, fmt.Errorf("safety kernel tls required")
  }

  // load CA and build tls.Config{RootCAs, MinVersion}
  return credentials.NewTLS(cfg), nil
}

Safety Kernel server TLS gate (Go)

safety_kernel_tls_server.go
Go
serverCreds := grpc.Creds(insecure.NewCredentials())
cert := strings.TrimSpace(os.Getenv("SAFETY_KERNEL_TLS_CERT"))
key := strings.TrimSpace(os.Getenv("SAFETY_KERNEL_TLS_KEY"))

if cert != "" || key != "" {
  if cert == "" || key == "" {
    return fmt.Errorf("safety kernel tls requires both SAFETY_KERNEL_TLS_CERT and SAFETY_KERNEL_TLS_KEY")
  }
  reloader, _ := tlsreload.NewCertReloader(cert, key, "safety-kernel")
  go reloader.WatchLoop(context.Background(), 30*time.Second)
  serverCreds = grpc.Creds(credentials.NewTLS(&tls.Config{GetCertificate: reloader.GetCertificate}))
}

if env.IsProduction() && cert == "" {
  return fmt.Errorf("safety kernel tls required in production")
}

Baseline production env

safety_kernel_tls.env
Bash
# Safety Kernel server
export SAFETY_KERNEL_TLS_CERT=/etc/cordum/tls/server.crt
export SAFETY_KERNEL_TLS_KEY=/etc/cordum/tls/server.key

# Scheduler/Gateway clients
export SAFETY_KERNEL_TLS_CA=/etc/cordum/tls/ca.crt
export SAFETY_KERNEL_TLS_REQUIRED=true
export SAFETY_KERNEL_INSECURE=false

# Global protocol floor
export CORDUM_TLS_MIN_VERSION=1.3

Limitations and tradeoffs

  • - Strict TLS requirements improve integrity but raise rollout coupling across services.
  • - CA path mistakes fail fast, which is safer, but can impact availability during bad deploys.
  • - Cert rotation watchers reduce restart pressure, but still need tested key distribution paths.
  • - Mesh-level mTLS helps, yet app-level TLS checks remain necessary for defense in depth.

`SAFETY_KERNEL_INSECURE=true` is a controlled exception for non-production testing, not a production shortcut.

Next step

Run this transport hardening sequence this week:

  1. 1. Enforce server cert/key on all Safety Kernel pods and verify startup blocks missing keypairs.
  2. 2. Set `SAFETY_KERNEL_TLS_REQUIRED=true` and remove insecure exceptions from production values.
  3. 3. Validate `SAFETY_KERNEL_TLS_CA` distribution to scheduler and gateway before rollout.
  4. 4. Execute one cert-rotation drill and confirm connections remain healthy across the 30s watch window.

Continue with Policy URL SSRF Hardening and LLM Safety Kernel.

Transport is governance plumbing

If safety decisions move over insecure channels, policy quality no longer matters.