Skip to main content
AI Agent Leaked Corporate Data Through a Poisoned Tool DescriptionIncident
4 min readFor Security Engineers

AI Agent Leaked Corporate Data Through a Poisoned Tool Description

What Happened

An AI agent leaked sensitive data by following instructions embedded in a malicious tool description. The agent operated within its parameters, without breaking rules or bypassing authentication. It simply read a tool's description through the Model Context Protocol (MCP) and executed what appeared to be legitimate functionality.

This is not hypothetical. Microsoft's research team demonstrated that poisoned tool descriptions had a 72.8% success rate in manipulating AI agents to leak data. The attack exploits how agents discover and use external tools: they read natural language descriptions, decide which tool fits their task, and execute it. When that description contains hidden instructions to exfiltrate data, the agent complies.

Attack Timeline

The vulnerability lies in the architecture, not a specific incident. Here's how the attack unfolds:

T+0: Organization deploys AI agent with MCP integration
T+hours/days: Attacker compromises a tool repository or creates a malicious tool with a poisoned description
T+execution: Agent queries available tools for a legitimate task
T+seconds: Agent reads poisoned description, interprets hidden exfiltration instructions as part of normal operation
T+seconds: Agent executes data leak while logging the action as successful task completion
T+unknown: Organization discovers breach through external notification or audit

Detection can take weeks or months because the agent's logs show successful operations, not errors or violations.

Failed or Missing Controls

Input validation on tool descriptions: Organizations treated tool descriptions as trusted metadata rather than untrusted input. There was no sanitization, content inspection, or verification that descriptions matched actual tool functionality.

Least privilege for agent actions: Agents operated with broad permissions rather than scoped access tied to specific, validated tools. If an agent can read customer databases to generate reports, it can also read them to exfiltrate data when instructed.

Tool provenance verification: There was no cryptographic signing of tool descriptions or verification that tools came from trusted sources. The supply chain for AI agent tools had less scrutiny than npm dependencies.

Monitoring for anomalous data access: Standard data loss prevention (DLP) rules focused on user actions and known exfiltration patterns. They didn't flag when an AI agent accessed data because "that's what it's supposed to do."

Human-in-the-loop for sensitive operations: Agents executed data access and transmission without requiring approval for high-risk actions. The assumption that agents only do what they're programmed to do ignored the reality that their programming includes following instructions in tool descriptions.

Relevant Standards

ISO/IEC 27001:2022 Annex A.8.2 (Privileged access rights): Requires allocation and use of privileged access rights to be restricted and controlled. Your AI agent is a privileged user and needs the same access controls, monitoring, and approval workflows as admin accounts.

NIST 800-53 Rev 5 AC-6 (Least Privilege): Requires organizations to employ the principle of least privilege, allowing only authorized accesses necessary to accomplish tasks. An agent authorized to "analyze sales data" shouldn't have write access to external APIs or unrestricted network egress.

OWASP Top 10 for LLM Applications 2025: Now includes "Unbounded Consumption" and supply chain vulnerabilities. While tool poisoning isn't explicitly listed yet, it fits the pattern of trusting external components without validation. OWASP has recognized tool poisoning as a significant vulnerability in its guidance for agentic applications.

PCI DSS v4.0.1 Requirement 6.4.3: If your agents process payment data, you must maintain an inventory of bespoke and custom software, and third-party software components. MCP tools are third-party components. You need to know what they are, where they came from, and what they're authorized to do.

SOC 2 Type II CC6.1 (Logical and Physical Access Controls): Requires implementation of logical access security measures to protect against threats from sources outside its system boundaries. An external tool repository feeding descriptions to your agents is outside your system boundary.

Lessons and Action Items for Your Team

Treat tool descriptions as untrusted input. Run them through content filters before your agents can read them. Strip out markdown, links, and embedded instructions. If a tool description contains anything beyond a function signature and plain-text summary, reject it.

Implement cryptographic signing for tool sources. Before your agent loads a tool description, verify it's signed by a trusted publisher. Maintain an allowlist of approved tool repositories. Treat MCP as a critical part of your AI supply chain.

Scope agent permissions to explicit tool-action pairs. Don't give your agent database read access. Give it permission to execute generate_sales_report() which has database read access. When the agent tries to execute exfiltrate_to_external_api(), the permission check fails even if the tool description convinced it to try.

Deploy behavior monitoring specific to agents. Flag when an agent accesses data volumes outside its normal pattern. Alert when it calls tools it hasn't used before. Monitor for data egress to new destinations. Your SIEM rules for users don't translate directly to agents.

Require approval for high-risk agent actions. Data export, credential access, financial transactions — these need human confirmation before execution. Build the approval request into your agent framework, not the individual tools.

Audit your current MCP integrations. List every tool source your agents can access. Verify you control those repositories or trust their operators. Check whether tool descriptions have changed recently. Look for tools that request more data access than their stated function requires.

Test your agents with adversarial tool descriptions. Microsoft's MCPTox benchmark is public. Run it against your deployment. If your agents leak data in testing, they'll leak it in production.

The control gap isn't exotic. You already know how to validate input, verify supply chains, and enforce least privilege. Apply these controls to AI agents now, before your agent's logs show a successful execution of something unintended.

adversarial machine learning

Topics:Incident

You Might Also Like