Skip to main content
Sandboxes Won't Save Your Agent CodeGeneral
4 min readFor Security Engineers

Sandboxes Won't Save Your Agent Code

The Conventional Wisdom

The industry consensus suggests running AI-generated code in isolated sandboxes before it touches production. Tools like Greptile's TREX feature and Cursor's cloud agents create these sandboxed environments. The logic seems sound—if the code runs without errors in a sandbox, it's safe to merge.

Security teams appreciate this model. It aligns with the idea of putting untrusted input (AI-generated code) into a controlled environment, observing its behavior, and then allowing it near real systems. The sandbox acts as a containment area for potential issues.

The Limitations of Sandboxes

Here's the issue: sandboxes catch syntax errors and obvious runtime failures but miss the bugs that can disrupt production.

Your agents are writing code that interacts with distributed systems—microservices, message queues, databases, external APIs. A sandbox with mocked dependencies confirms the code runs. It doesn't confirm the code works with your actual Postgres connection pool settings, Redis cluster topology, or third-party payment gateway's rate limits.

For example, when Stripe's internal agents ship over 1,000 reviewed PRs weekly, they're not just generating isolated functions. They're modifying code that affects authentication flows, payment processing, and data pipelines. A sandbox that validates "this Lambda function executes" misses "this Lambda function will exhaust your RDS connection pool under normal load."

The security implications are more severe than performance bugs. An agent might generate code that passes all sandbox checks but introduces a timing vulnerability in your authentication flow or creates a new path that bypasses your input validation layer. The sandbox sees successful execution but not the integration failure that creates the vulnerability.

The Evidence

Consider what Devin does: they run agent code in a full environment, not an isolated sandbox. This approach is intentional. System-level bugs—those that cause security incidents and compliance violations—only emerge when code interacts with the complete system.

Take authentication as an example. Your sandbox might mock your OAuth provider and validate that the agent's code correctly handles the token exchange. But does it respect your session timeout policy defined in your identity provider? Does it log authentication attempts to your SIEM? Does it trigger your anomaly detection rules? The sandbox doesn't know.

PCI DSS v4.0.1 Requirement 6.4.3 mandates detecting and preventing common coding vulnerabilities. A sandbox that validates isolated execution doesn't help you meet this requirement when the vulnerability exists in how the new code integrates with your existing authentication middleware or logging pipeline.

The shared production-like system model that companies like Signadot are developing addresses this gap. Instead of mocking your database, you connect to an actual database with production-like data volumes and query patterns. Instead of stubbing your API calls, you hit real endpoints (or production mirrors) that enforce actual rate limits and authentication requirements.

What to Do Instead

Build verification environments that mirror production topology, not just production code. Your agent verification process should include:

Real Integration Points: Connect to actual databases, message queues, and internal APIs. Use separate instances with production-equivalent configurations, not mocks that return canned responses.

Production-Equivalent Data Volumes: Your agent's database query might work fine against 100 test records. It could create a denial-of-service condition against your 50-million-row production table. Test with realistic scale.

Actual Security Controls: Run the agent's code through your real WAF rules, authentication middleware, and input validation layers. If it bypasses a control in the verification environment, it will bypass it in production.

Full Observability Stack: Route logs, metrics, and traces through your actual monitoring pipeline. If the agent's code doesn't generate the audit logs your SOC 2 Type II controls require, you need to know before merging, not during your next audit.

This doesn't mean giving agents direct production access. It means creating an environment where the system behavior matches production, even if the infrastructure is separate. When your agent modifies an API endpoint, verify it against the actual rate limiting logic, the real authentication flow, and the complete request validation chain.

For security-critical changes, add a verification step that runs your security test suite—the same OWASP ASVS checks you run in your CI/CD pipeline—against the agent's changes in the production-like environment. An agent that introduces an SQL injection vulnerability will pass a sandbox check but should fail your parameterized query validation tests.

When Sandboxes Are Useful

Sandboxes have a place in your agent verification workflow—just not as the final validation step.

Use sandboxes for the initial execution check. Before you spin up a full production-like environment, verify that the agent's code actually runs. This catches obvious failures quickly: syntax errors, missing dependencies, basic runtime exceptions. There's no reason to use infrastructure resources on code that won't execute.

Sandboxes also work for isolated, pure functions. If your agent generates a data transformation utility that takes input and returns output without touching external systems, a sandbox is sufficient. The security risk is contained to the function itself.

Sandboxes remain critical for detecting malicious code. Before running agent-generated code near real systems—even production-like ones—execute it in a heavily monitored sandbox that can detect and block attempts to exfiltrate data, make unauthorized network calls, or modify files outside the expected scope.

The sequence matters: sandbox first to catch obvious failures and malicious behavior, then production-like environment to catch integration bugs and security issues that only surface in system context. Both steps are necessary. Neither is sufficient alone.

Your agents are writing code at scale. Ensure you're verifying it at scale too.

Topics:General

You Might Also Like