AI Agent Governance: Proactive Control Before Catastrophe

Your AI agents can execute code, access sensitive data, and communicate externally. One in four MCP servers already combines these capabilities. The question isn't whether you're monitoring them—it's whether you're controlling them before they act.

Scope - What This Guide Covers

This guide addresses governance controls for AI agents using Model Context Protocol (MCP) servers and Skills frameworks. You'll learn how to implement capability-based restrictions before deployment, not after incidents. This includes:

Risk assessment for MCP servers and Skills
The No Excessive CAP framework from OWASP LLM06:2025
Implementation patterns for capability restrictions
Enforcement mechanisms you can deploy this quarter

What this guide doesn't cover: LLM prompt injection defenses, model training security, or general API security patterns.

Key Concepts and Definitions

MCP Servers: Components that extend AI agent capabilities through defined interfaces. They provide tools for file system access, code execution, database queries, and external API calls.

Skills: Discrete capabilities an AI agent can invoke. A Skill might be "read file," "execute shell command," or "send email."

CAP Framework: Capability, Access, and Permission controls. From OWASP LLM06:2025, this framework requires you to limit agent capabilities to the minimum needed for their intended function.

Capability Creep: The pattern where agents accumulate permissions over time without review. Half of MCPs that communicate externally also handle untrusted input and sensitive data in the same toolset—a direct result of capability creep.

Requirements Breakdown

OWASP LLM06:2025 establishes the No Excessive CAP principle. Here's how it maps to implementation:

Capability Restrictions: Your agent should not have access to capabilities beyond its documented purpose. If an agent summarizes customer feedback, it doesn't need code execution or database write access.

Access Controls: Separate agents by data sensitivity. An agent processing public documentation shouldn't share toolsets with one handling payment data.

Permission Boundaries: Enforce least privilege at the MCP level. If an agent needs to read configuration files, restrict it to specific directories—not filesystem-wide access.

The framework requires three control layers:

Pre-deployment capability audit: Document every Skill your agent can invoke.
Runtime enforcement: Block capability invocations outside the approved set.
Periodic review: Quarterly assessment of whether capabilities still match use cases.

Implementation Guidance

Step 1: Inventory Your Agent Capabilities

Build a capability matrix for each agent. List every MCP server it connects to and every Skill it can invoke. Tag each capability:

Data sensitivity: Public, Internal, Confidential, Restricted
Action type: Read, Write, Execute, Communicate
Reversibility: Can you undo this action automatically?

Focus on combinations. An agent with "read sensitive data" + "communicate externally" creates exfiltration risk. An agent with "untrusted input" + "code execution" creates injection risk.

Step 2: Apply No Excessive CAP

For each agent, ask:

Does this agent need code execution? If your agent analyzes logs, it doesn't need shell access. Remove the capability entirely.

Does this agent need external communication? If your agent operates on internal data, disable external API calls at the MCP configuration level.

Does this agent need write access? Read-only agents can't cause data loss. If your agent generates reports but humans approve them, keep the agent read-only and build a separate approval workflow.

Step 3: Implement Technical Controls

Configuration-level restrictions: Most MCP servers support capability allow-lists. Define permitted Skills in your MCP configuration file before the agent starts.

Network segmentation: Deploy agents that handle sensitive data in isolated network segments. Use firewall rules to block external communication for agents that don't need it.

Execution sandboxing: For agents that require code execution, use container-based sandboxes with no network access and read-only filesystems except for designated output directories.

Audit logging: Log every capability invocation with context: which agent, which Skill, what parameters, what result. Your SIEM should alert on capability usage outside normal patterns.

Step 4: Build Review Cadence

Quarterly, review each agent:

Has its function changed?
Are all its capabilities still necessary?
Have any high-risk capability combinations emerged?

When you add a new Skill to an existing agent, repeat the risk assessment. Capability additions should require the same approval process as deploying a new agent.

Common Pitfalls

Pitfall 1: Trusting prompts to enforce security

Your system prompt tells the agent "never execute code on production systems." The agent has code execution capability anyway. Under adversarial input or model confusion, that prompt won't hold. Remove the capability instead.

Pitfall 2: Observability without enforcement

You log every agent action. You can see when an agent executes unexpected code. But logging doesn't stop the action—it just tells you it happened. Implement technical controls that prevent the action, not just detect it.

Pitfall 3: Capability sprawl through convenience

Your team builds an agent that needs to read customer data. Instead of creating a dedicated MCP server with read-only access to customer tables, you connect it to your existing admin MCP that has full database access. Now your agent can drop tables. Build purpose-specific MCP servers for new agents.

Pitfall 4: Ignoring capability combinations

Each individual capability looks reasonable. Reading files: fine. Communicating externally: fine. Together: data exfiltration risk. Assess combinations, not just individual capabilities.

Pitfall 5: No rollback plan for capability changes

You restrict an agent's capabilities and it breaks a workflow you didn't know about. Have a documented process to quickly restore capabilities while you investigate, then re-apply restrictions with better scoping.

Quick Reference Table

Risk Pattern	Detection Method	Control Implementation	Review Frequency
Code execution + untrusted input	Capability matrix audit	Remove code execution or sandbox with no network	Before deployment
Sensitive data + external communication	MCP server configuration review	Network segmentation, disable external APIs	Quarterly
Write access + automated workflows	Permission boundary analysis	Make agent read-only, add human approval step	After each capability change
Capability creep	Quarterly capability review	Remove unused Skills, split multi-purpose agents	Quarterly
MCP server sharing across agents	Agent-to-MCP mapping	Create purpose-specific MCP servers	Before adding agent to existing MCP

Next Actions

This week: Build your capability matrix for your three highest-risk agents.
This month: Implement No Excessive CAP controls for those three agents.
This quarter: Extend capability restrictions to all production agents and establish a quarterly review schedule.

Your agents don't need every capability they could have. They need exactly the capabilities their function requires—and technical controls that prevent them from exceeding those bounds.

OWASP LLM06:2025