Glossary

LLM-as-a-Judge

LLM-as-a-judge is a technique that uses a language model to evaluate outputs — scoring quality, checking criteria, or comparing responses — in place of a human rater or a fixed rule. It is well-suited to grading subjective quality, but its probabilistic nature makes it a poor fit for enforcing safety decisions.

Definition

What LLM-as-a-judge is good at

Using one model to evaluate another scales human judgment for tasks that resist rigid rules: is this summary faithful, is this answer helpful, does this response match a rubric? For evaluation pipelines and offline quality measurement, LLM-as-a-judge is a practical, widely used tool. Its strength is handling nuance and natural language at volume — work that would otherwise require many human raters.

Why it is the wrong layer for enforcement

The same property that makes a judge model flexible — it is probabilistic — makes it unreliable as a gate on consequential actions. Ask 'should this agent be allowed to delete this database?' and an LLM judge may answer differently across runs, can be swayed by prompt injection, and cannot point to the rule it applied. Governance enforcement needs the opposite: deterministic policy that returns the same decision every time, resists manipulation, and is fully auditable. The pragmatic pattern is to use LLM-as-a-judge for quality evaluation and deterministic policy — like Cordum's Safety Kernel — for the action decisions that touch real systems. Judge the answer with a model; govern the action with a rule.

Frequently asked questions

Can I use an LLM to decide whether an agent action is allowed?

It is risky. An LLM judge is probabilistic, can be manipulated by prompt injection, and cannot show the rule it applied — so the same action might be allowed once and blocked the next time. For enforcement, deterministic policy that returns reproducible, auditable decisions is the safer layer.

When is LLM-as-a-judge appropriate?

For evaluating subjective quality at scale — faithfulness, helpfulness, rubric adherence — where a fixed rule cannot capture the nuance. Keep it in your evaluation pipeline, and keep deterministic policy in the action-enforcement path.

Govern your AI agents with Cordum

Cordum is the agent control plane: policy-before-dispatch enforcement, human approvals, and a tamper-evident audit trail for autonomous AI agents.

What Is AI Agent Governance?Browse the Glossary