Welcome back! Today’s playbook: why reliable agents aren’t built on better models — but on systems that never ask models to guarantee what they fundamentally can’t.

The Trigger

On March 20, 2026, Meta classified an internal AI agent failure as a Sev1 security event — the second most severe level in their internal risk framework. An engineer had invoked an internal LLM-based agent to help analyze a problem on an internal forum.

The agent was expected to deliver its response privately to the requesting engineer. Instead, it posted publicly to the forum — without any approval step — exposing proprietary code, business strategies, and user-related datasets to engineers who weren't authorized to see them. The exposure lasted two hours before Meta restored access. This happened inside one of the most well-resourced AI engineering teams in the world, running a system they built themselves.

What's Actually Wrong ?

The agent didn't malfunction. It did exactly what it was capable of doing: it had write access to the forum, it generated a response, and it posted it. The failure was that nothing in the system separated "can write to this forum" from "should post this publicly without a human approving it first." Those are two different things, and the architecture treated them as the same.

This is the structural gap behind most agent production failures — not model quality, not prompt engineering. Permission scope and execution scope aren't separated. The agent held a write operation it should never have been allowed to execute autonomously. No gate existed between the LLM producing a response and that response taking effect. No constraint check asked: is this action within the declared permission scope for this specific task?

LLMs running in agentic settings — with temperature above 0, which is standard — don't self-constrain. The model has no native concept of "this action requires human sign-off." It works from its system prompt and acts on the tools available to it. If the tool allows a public post, the model will use it when its reasoning concludes that's the appropriate action. You can't solve that with a better prompt. You solve it by controlling what the tool is permitted to do before the model ever reaches for it.

The Concept

Probabilistic Inference:
The mechanism by which LLMs generate outputs: at each step, the model samples from a probability distribution over possible next tokens, weighted by temperature.
At temperature=0 (greedy decoding), outputs become deterministic for identical inputs — but production agents almost never run at temperature=0, because zero temperature makes agents rigid and repetitive. At any temperature above 0, outputs are sampled, which means the same instruction can yield different actions across runs.

Tool Orchestration:
The explicit control layer in an agent system that defines which tools the agent can invoke, under what conditions, in what order, and with what permissions.
Without a defined orchestration layer, the agent itself decides what to call and when — which means execution paths are as variable as the model's outputs. In practice, this is the difference between a declared execution graph (you define the flow) and a ReAct-style loop (the model decides dynamically at each step).

Hybrid Agent Architecture:
A design pattern where deterministic components (rule engines, validators, schema checks, permission layers) handle tasks with known correct answers or hard constraints, and the LLM handles tasks that require reasoning or interpretation — with the two layers composed, not merged.
The deterministic layer provides guarantees; the LLM provides judgment. Neither layer does the other's job.

The Playbook

1. Classify every subtask before assigning it.
For each step, ask: is there a known correct answer or constraint, or does it require judgment?
Scope decisions (private vs public) are constraints. Explanations require judgment. If you can’t clearly separate the two, your decomposition isn’t done.

2. Build a deterministic layer for anything with hard constraints.
Permissions, scope, and validation should never be handled by the LLM. These rules must execute the same way every time — no reasoning, no exceptions. If the system allows it, the model will use it.

3. Pass constraints as structured context — not decisions.
Run validation first, then feed results to the LLM as facts (e.g. scope, recipients, permissions). The model interprets — it does not re-decide what’s allowed.

4. Replace dynamic loops with a declared execution graph for write actions.
ReAct-style flows are fine for exploration. But for anything that writes or modifies state, define the path upfront. The agent should follow allowed steps — not invent them at runtime.

5. Validate actions before execution — as a final gate.
An LLM proposing an action doesn’t make it authorized. Add a deterministic check: does this violate scope, permissions, or schema? Pass or block — no judgment involved.

The Tradeoffs

  • Rules only cover what you write. Deterministic checks catch known constraints, but gaps remain if you didn’t model an action. Keep reviewing rules as agent capabilities grow.

  • Execution graphs limit autonomy. Fixed paths reduce risk but can’t handle unexpected cases. Dynamic tasks may need flexibility — plan for updates.

  • Two layers, two failure surfaces. Failures can come from constraints, LLM reasoning, orchestration, or validation. Structured logging is essential at every boundary.

What did you think of today's email?
Your feedback helps me create better emails for you! comment down 👇
Loved It 😊
It was ok 🙂
Could be better 🤔

Until next time - Teja Derangula,
The gap between thinking and building has shrunk — take advantage.

Reply

Avatar

or to participate

Keep Reading