The most dangerous sentence in agent infrastructure is still this one: “The model will only call the tool when it makes sense.”

That sounds reasonable right up until a prompt, a resource, or a tool description quietly teaches the model to do something the operator never meant to authorize. The Model Context Protocol makes tool interoperability far more structured, but structure is not the same thing as safety. The real work is in permission boundaries.

The boundary is not “tool or no tool”

Teams often think about risk in a binary way:

  • tools enabled
  • tools disabled

That is too crude for real systems. A better mental model is four layers:

  1. what the model can see
  2. what the model can suggest
  3. what the model can execute
  4. what a human must explicitly approve

If those four layers are not separated, the host application becomes a soft target for prompt injection and accidental escalation.

Treat tool descriptions like part of your attack surface

Every tool description is effectively an instruction channel. If it is vague, the model fills in the gaps. If it overpromises, the model will explore the edge of that promise. If it includes high-risk verbs like “delete,” “publish,” or “rotate,” then the host must wrap those actions in stronger policy than a simple LLM decision.

Good tool descriptions do three things:

  • explain the exact scope of the tool
  • state what it must not do
  • expose the minimum required parameters

Bad tool descriptions sound impressive. Good ones sound narrow.

Classify tools by blast radius, not by category

A “filesystem” tool can be low risk or catastrophic depending on path scope. A “browser” tool can be harmless for read-only retrieval and dangerous for authenticated mutation. The correct classification is not the name of the tool. It is the blast radius if the model is wrong.

Use three classes:

  • low risk: read-only access with bounded scope
  • guarded: mutating actions that require confirmation
  • privileged: destructive or irreversible actions behind hard policy

That policy must live outside the model.

Code snippet

json

    {
  "tools": {
    "search_docs": {
      "risk": "low",
      "mode": "auto"
    },
    "create_draft_release": {
      "risk": "guarded",
      "mode": "confirm"
    },
    "delete_production_secret": {
      "risk": "privileged",
      "mode": "deny"
    }
  }
}

  

Human approval should be specific, not theatrical

Too many teams add a generic confirmation dialog and call it safety. That is not approval. That is interface decoration.

Real approval must show:

  • exactly what resource is being touched
  • exactly what change is proposed
  • whether the action is reversible
  • what identity and environment are involved

If the operator cannot answer those questions in two seconds, the confirmation UI is not good enough yet.

The host application owns the final policy

This is where mature teams pull ahead. The protocol can describe tools, prompts, and resources in a standardized way, but the host still decides what actually happens. That means:

  • scoping credentials outside the model
  • limiting accessible paths and domains
  • inserting approval gates
  • logging every high-risk action with full context

The model can recommend. The host must enforce.

The 2026 checklist I would enforce before launch

Before I would trust an MCP-based workflow in production, I would require these controls:

  • tool descriptions reviewed like code
  • per-tool risk labels
  • host-side allowlists for paths, hosts, or environments
  • irreversible actions denied by default
  • confirmation screens for all guarded actions
  • audit trails for every privileged request
  • replayable traces for incident review

That list sounds strict only until the first time an agent tries to take a shortcut on live infrastructure.

Final view

MCP is a strong foundation for interoperable tooling, but interoperability is not the same thing as governance. The winning teams in 2026 will not be the ones with the most tools. They will be the ones with the clearest permission boundary between model intent, host policy, and human authority.

That boundary is what keeps the toolchain useful instead of fragile.