The groundedness, blast-radius and justification checks run in the agent as pure, unit-tested functions, backed by the LLM-as-judge — so a wrong output can’t reach the cluster even if a platform check is lenient. These in-agent gates are the core fail-safe logic; the platform guardrails are defense-in-depth around them.
PII / secret redaction
Before the model ever sees the gathered logs, I mask secrets and PII. The cluster signals deliberately include a leakedpostgres://… credential line so you can watch this work. On the platform side this is the native Secrets Detection + PII/PHI guardrails running in mutate mode on the LLM Input hook, masking credentials/tokens/PII pre-prompt.
The quality gate (groundedness)
Rule-based groundedness validates the structuredDiagnosis field-by-field:
suspected_resourcemust be a real service.suspected_deploy_shamust be a real recent deploy.confidencemust be ≥ 0.5.
The LLM-as-judge
Rule checks catch structural problems; an independent judge call reasons about whether the action is actually justified by the evidence. On the live cluster it returned:“all signals point to an ImagePullBackOff… prod-db is not implicated by any metric or log error,”rejecting the hallucination in plain English. It’s resilient (errors never block), runs a cheap judge model to bound cost, and is gated behind the
BACKSTOP_LLM_JUDGE flag.
The action-validation gate
check_action runs before any write — it’s the real differentiator. As pure, deterministic, testable functions it enforces:
- Blast radius — reject
scope=all. - Protected resources — reject
prod-dbandpayments. - Target exists — confirm the action’s target is real.
- Matches evidence — confirm the action matches the diagnosis.

