Tool Injection
Tool injection is an attack technique targeting AI agents and large language model (LLM) integrations, where an adversary manipulates the tools or tool descriptions available to an AI agent so that the agent executes unintended or malicious actions. This can occur when an attacker influences which tools an agent can call, or alters the metadata describing those tools, causing the agent to behave in ways its operators did not authorize.
Tool injection refers to an attack vector in AI-agent architectures (including those using protocols such as MCP, or Model Context Protocol) in which an adversary introduces, substitutes, or modifies tool definitions, descriptions, or registrations that an LLM-based agent consumes when deciding which tool to invoke and with what parameters. By poisoning the tool metadata or injecting unauthorized tool endpoints into the agent's available toolset, the attacker can redirect agent behavior to exfiltrate data, escalate privileges, or perform unauthorized operations. Tool injection is related to, but distinct from, prompt injection: prompt injection targets the model's instruction-following behavior via crafted input text, whereas tool injection specifically targets the tool-selection and tool-invocation layer of the agent framework. Detection of tool injection typically requires runtime inspection of tool registries and invocation chains, as static analysis of application code alone generally cannot identify manipulated tool metadata or unauthorized tool registrations that occur dynamically. Defensive approaches may include tool allowlisting, integrity verification of tool descriptions, and monitoring of tool invocation patterns, though the efficacy of these controls varies depending on the agent framework and deployment context, and practitioners should note that no single mitigation currently provides comprehensive coverage against all tool injection variants.
Why it matters
As organizations increasingly deploy AI agents that autonomously select and invoke external tools, the tool-selection layer becomes a critical trust boundary. Tool injection attacks exploit this boundary by manipulating the metadata, registrations, or descriptions that an agent relies on when deciding which tool to call and what parameters to pass. Because the agent typically treats tool definitions as authoritative, a successfully injected or modified tool definition can redirect the agent to exfiltrate sensitive data, perform unauthorized operations, or escalate privileges, all while appearing to operate normally from the agent's perspective. The consequences can be severe in environments where agents have broad permissions or access to sensitive resources.
Who it's relevant to
Inside Tool Injection
Common questions
Answers to the questions practitioners most commonly ask about Tool Injection.