AI Gateway + Virtual Models
I point the OpenAI SDK at the gateway (base_url + a virtual key) and call a single virtual model, prod-triage, that I configured with a priority fallback chain over AWS Bedrock: Claude Sonnet → Llama 4 Maverick → Amazon Nova Pro → Claude Haiku, with retry/fallback on 401/403/404/408/429/5xx. The agent calls one model name; the gateway handles failover.
I also attached a rate-limit policy (to force and demonstrate live failover) and a budget/cost-limit policy across the chain. Every call carries X-TFY-LOGGING-CONFIG, so request traces, fallback events, and per-model cost land in AI Monitoring.
Guardrails
I registered a guardrail group and applied it toprod-triage with a policy that runs on both the LLM Input and LLM Output hooks:
- Input (redact): native Secrets Detection + PII/PHI guardrails, in mutate mode, mask credentials/tokens/PII before the model sees them.
- Output (validate): a custom guardrail I built and host (
/tfy/quality) validates the model’s response shape and confidence. - In-agent (the core fail-safe logic): the groundedness, blast-radius and justification checks run in the agent as pure, unit-tested functions, backed by the LLM-as-judge — so a wrong output can’t reach the cluster even if a platform check is lenient.
MCP Gateway
- Official remote MCP (Linear): on resolution, the agent pages on-call and files an incident ticket through a curated virtual MCP server that exposes only safe tools (ticket creation), with destructive tools toggled off and auth managed centrally.
- Custom MCP endpoint: I built a read-only “infra” MCP server with FastMCP (
get_signals,deployment_status,namespaces) and connected it to the gateway over streamable-http, so live cluster state is reachable through the MCP layer with a full audit trail. I kept it strictly read-only by design — no destructive tool is ever exposed.

