Skip to content
DevOps

Govern AI agents on Kubernetes Triage

Control AI agents that diagnose and remediate Kubernetes issues. The Kubernetes triage pack enforces approval gates before agents take corrective actions like scaling, restarting pods, or modifying deployments.

What this pack does

  • Approval gates before pod restarts and scaling
  • Policy checks on deployment modifications
  • Namespace allowlist scoping for agent access
  • Incident response audit trail

Use cases

Require approval before an agent restarts or scales a production deployment

Scope agents to an allowlist of namespaces

Audit all automated incident remediation steps

Quick setup

  1. 1Install the pack: cordumctl pack install kubernetes-triage
  2. 2Configure kubeconfig with appropriate RBAC
  3. 3Define policy rules for destructive operations
  4. 4Enable the pack in your Cordum dashboard

Frequently asked questions

Can agents inspect a cluster without being able to change it?

Yes. The triage worker runs read-only kubectl inspections by default and treats remediation commands as write actions that require approval. Agents can gather diagnostics freely while corrective steps like scaling or restarts stay gated.

How do I keep agents out of sensitive namespaces?

The worker enforces an allowed-namespaces list and runs under the RBAC of the kubeconfig you provide, so an agent can only act where you have granted access. The pack's write actions are limited to restarting deployments, scaling deployments, and deleting pods, and you can deny specific write actions outright in policy.

Will an agent restart pods in production without review?

Not unless you allow it. Approval gates sit in front of pod restarts, scaling, and deployment changes, so the Safety Kernel returns Require Approval for those actions and waits for a human before any remediation reaches the cluster.

What does a reviewer see before approving a remediation?

The approval shows the exact requested action and its target, such as scaling or restarting a named deployment in a specific namespace, so the reviewer knows precisely what will run before they sign off. Every diagnostic and remediation step is then recorded in the audit trail for the post-incident review.

Ready to govern Kubernetes Triage?

Other integrations