Skip to main content
OpenClaw Agent Exploited via Email InjectionIncident
5 min readFor Security Engineers

OpenClaw Agent Exploited via Email Injection

Your AI agent just forwarded AWS credentials to an attacker. The request came through a routine email contact. No exploit kit, no zero-day — just a text field the agent trusted implicitly.

Two research teams documented separate attack paths against OpenClaw, a self-hosted AI agent. Both exploits succeeded because the agent's architecture treats all input as equally trustworthy. One research group injected code through contact metadata. The other exfiltrated credentials through a simple email social engineering attack.

What Happened

Imperva's research team identified that OpenClaw processed contact names, vCard fields, and location labels directly in the LLM prompt body. An attacker could craft a malicious contact entry containing instructions that the agent would execute as legitimate commands.

Separately, Varonis demonstrated a social engineering vector: they sent OpenClaw an email requesting help with an AWS configuration issue. The email included mock AWS credentials. OpenClaw forwarded the entire message — credentials included — to an attacker-controlled address when prompted to "share this with the team."

The maintainers released OpenClaw 2026.4.23, which moves contact names, vCard fields, and location labels out of the prompt body. This addresses the Imperva finding. The Varonis vector remains partially mitigated through prompt engineering alone.

Timeline

Initial deployment: Organizations install OpenClaw to automate email triage, calendar management, and data lookup tasks.

Vulnerability window: The agent processes untrusted input (contact fields, email content) with the same trust level as system instructions.

Imperva disclosure: Researchers demonstrate code execution through crafted contact metadata.

Varonis disclosure: Researchers show credential exfiltration through social engineering.

Partial patch: Version 2026.4.23 ships, isolating contact metadata from the prompt context.

Remaining risk: Email content processing still relies on the LLM's ability to distinguish legitimate requests from attacks — a capability that varies by model.

Which Controls Failed

Input validation: The agent accepted contact metadata and email content without sanitization or type checking. There was no distinction between data (a contact's name) and instructions (commands for the LLM to execute).

Least privilege: The agent had permissions to forward emails, access credentials in message bodies, and execute file operations — all without requiring explicit user approval for sensitive actions.

Output encoding: When forwarding emails or processing contact data, the agent didn't strip or escape content that could be reinterpreted as commands in downstream contexts.

Separation of duties: User-supplied data and system instructions shared the same execution context. The architecture provided no boundary between "what the user wants" and "what the contact file says."

Anomaly detection: No mechanism flagged unusual patterns like "forward this message containing AWS keys to an external address" or "execute this contact field as a command."

What Standards Require

OWASP ASVS v4.0.3 Requirement 5.1.1 mandates input validation using positive validation (allowlists) and contextual output encoding. Contact fields are strings; they should be validated as strings and encoded when passed to an LLM context. The same requirement applies to email content before it's processed as agent instructions.

OWASP Top 10 2021: A03 Injection directly addresses this failure mode. When you pass user-controlled data to an interpreter (in this case, an LLM), you must treat that data as untrusted. Prompt injection is injection — the attack surface changed, but the control requirement didn't.

NIST 800-53 Rev 5 Control SI-10 (Information Input Validation) requires checking validity, accuracy, and completeness of information inputs. A contact's name field should never contain instructions to exfiltrate data. If your validation logic can't distinguish between the two, your architecture is the problem.

ISO/IEC 27001:2022 Annex A.8.3 (Media Handling) covers information handling procedures. When your AI agent processes email, it's handling information that may contain sensitive data. The control objective is to prevent unauthorized disclosure — which means the agent needs explicit rules about what it can forward and to whom.

PCI DSS v4.0.1 Requirement 6.2.4 (for any organization handling payment data) requires that security vulnerabilities are identified using reputable sources and that risk rankings are assigned. If you're running AI agents with access to cardholder data environments, prompt injection vulnerabilities are in scope. You need a process to evaluate and patch them.

Lessons and Action Items

Treat LLM prompts as code execution contexts. Stop thinking of AI agents as "smart assistants" and start thinking of them as interpreters. Every input is a potential command injection. Implement the same controls you'd use for SQL queries or shell commands: parameterization, allowlisting, and context separation.

Isolate data from instructions at the architecture level. The OpenClaw patch moved contact metadata out of the prompt body — this is the right direction. Map every data flow in your agent: Where does user input enter? Where does it get interpreted? Can you separate "this is data to display" from "this is an instruction to execute"? If the answer is "the LLM figures it out," you don't have separation.

Implement approval gates for sensitive operations. Your agent should never forward emails containing credentials, execute file operations, or access external APIs without explicit user confirmation. Build a permission model: read-only operations proceed automatically, write operations require approval, and high-risk operations (anything involving secrets, external communication, or code execution) require multi-factor confirmation.

Test with adversarial inputs. Create a test suite of malicious contacts, emails, and calendar entries. Include prompt injection attempts, social engineering scenarios, and data exfiltration requests. Run this suite against every model you're considering — the Varonis research showed that different LLMs respond differently to the same attack. Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4 had different susceptibility profiles. Your testing should reveal these differences before you deploy.

Monitor for anomalous behavior. Log every action your agent takes: emails forwarded, files accessed, external requests made. Set thresholds: "agent forwarded 3 emails in 10 minutes" should trigger a review. "Agent accessed a message containing 'AWS_SECRET_ACCESS_KEY' and then sent an external email" should trigger an immediate block and alert.

Assume prompt engineering is not a security control. The remaining Varonis vector relies on the LLM's ability to detect social engineering. This is not a technical control — it's a hope. Document which risks you're accepting through prompt-based mitigations and which require architectural changes. Your risk register should reflect the difference.

The trust model that makes AI agents useful — their ability to interpret natural language and take action — is the same model that makes them exploitable. You can't patch your way out of an architectural problem. Build boundaries, enforce least privilege, and validate every input as if it's hostile. Because in an AI agent context, it might be.

Topics:Incident

You Might Also Like