Skip to main content
LLM Code Scanner Generates Working ExploitsIncident
4 min readFor Security Engineers

LLM Code Scanner Generates Working Exploits

SecureLayer7 released Sandyaa in late 2024 under an MIT license. This tool uses large language models to scan source code for vulnerabilities and then writes exploit code to prove they're real. Within weeks, security teams reported a significant issue: Sandyaa's output included functional exploit code that bypassed their existing code review processes.

The Issue

Sandyaa analyzes codebases using large language models to identify security flaws. Unlike traditional static analyzers that flag potential issues, Sandyaa generates working proof-of-concept exploits for confirmed vulnerabilities. The tool writes Python or shell scripts that demonstrate exactly how an attacker could exploit the flaw.

This led to an unintended consequence. Teams running Sandyaa in their CI/CD pipelines found themselves committing exploit code to their repositories. The exploits lived alongside the vulnerable code they were meant to highlight. Some organizations discovered these artifacts during security audits months later.

Timeline of Events

Initial Release (Q4 2024)
SecureLayer7 publishes Sandyaa as open source, highlighting its ability to reduce false positives by proving vulnerabilities are exploitable.

First Integrations (Weeks 1-4)
Early adopters integrate Sandyaa into automated scanning workflows, appreciating the reduction in noise compared to traditional SAST tools.

Discovery Phase (Weeks 5-12)
Security teams conducting repository audits find exploit code in commit history. Some exploits target vulnerabilities that were patched but never removed from version control. Others demonstrate attacks against third-party dependencies still in use.

Containment (Ongoing)
Organizations scramble to identify which repositories contain Sandyaa-generated exploits and whether any reached production artifacts.

Controls That Failed or Were Missing

No Output Sanitization
Sandyaa's default behavior writes exploit code directly to disk in the same directory structure as the target codebase. Teams treated these files as documentation rather than executable attack tools.

Missing Code Review Gates
Automated security scans typically run pre-commit or in isolated CI environments. Organizations failed to implement review requirements for the scanner's output before allowing it into version control.

Inadequate Secrets Detection
Many exploit scripts included hardcoded credentials or API endpoints needed to demonstrate the vulnerability. Standard secrets scanning tools flagged these, but teams dismissed them as "test data" from security tooling.

No Artifact Classification
Organizations lacked processes to tag and track security research artifacts separately from production code. Exploit scripts propagated through the same channels as application code.

Compliance Requirements

PCI DSS v4.0.1 Requirement 6.3.2 mandates identifying security vulnerabilities using industry-accepted approaches. When using tools that generate exploit code, you must control how that code is stored and transmitted. Requirement 11.3.1.3 specifically addresses penetration testing artifacts, requiring documented handling procedures and restricted access.

OWASP ASVS v4.0.3 Requirement 14.2.3 states: "Verify that if application assets, such as JavaScript libraries, CSS stylesheets, or web fonts, are hosted externally, Subresource Integrity (SRI) is used to validate the integrity of the asset." This extends to security tooling outputs. When your scanner generates executable code, you need integrity controls around those artifacts.

ISO/IEC 27001:2022 Control 8.31 (Security of information in use) requires you to protect information during processing. Exploit code is sensitive information. Writing it to your codebase without access controls or encryption violates this control.

NIST 800-53 Rev 5 SA-11 (Developer Testing and Evaluation) requires security testing, but SA-11(8) specifically addresses dynamic code analysis and the handling of test artifacts. You must ensure testing tools don't introduce new vulnerabilities through their output.

Action Items for Your Team

Isolate Scanner Output
Configure Sandyaa and similar tools to write results to a dedicated directory outside your source tree. Use /tmp or a separate artifact storage system. Never let scanner output mix with application code.

# Bad
sandyaa scan ./src --output ./src/security-findings/

# Better
sandyaa scan ./src --output /var/security-artifacts/$(date +%Y%m%d)/

Implement Output Review
Add a manual approval gate before any security scanner output enters version control. Treat exploit code with the same controls you'd apply to penetration testing results: restricted access, time-limited storage, and audit logging.

Tag Security Artifacts
If you must store exploit code in your repository (for security research or red team exercises), use explicit tagging:

  • Add .exploit or .poc extensions
  • Store in a /security-research directory with a CODEOWNERS file
  • Include a README explaining the artifact's purpose and required handling

Extend Secrets Scanning
Update your secrets detection rules to flag exploit-specific patterns: SQL injection payloads, shell command strings, network scanning scripts. Don't dismiss these as false positives just because they came from a security tool.

Set Retention Policies
Define how long you keep exploit code. For most teams, 30 days is sufficient to verify the fix. After that, delete the artifacts or move them to offline storage with access controls.

Document Tool Behavior
Before deploying any AI-driven security tool, document what it writes to disk, where it writes, and who can access the output. Include this in your SA-11 testing procedures if you're subject to NIST 800-53.

Review Existing Repositories
Search your version control for Sandyaa output patterns:

  • Files matching *exploit*.py or *poc*.sh
  • Commits with messages containing "Sandyaa" or "vulnerability demonstration"
  • Directories named /exploits, /pocs, or /security-findings

If you find artifacts, determine whether they're still relevant. Delete obsolete exploits and move active ones to controlled storage.

The core issue isn't Sandyaa's functionality—generating exploit code helps confirm vulnerabilities are real. The problem is treating that output like any other scanner result. When your tools write executable attack code, you need controls that match the risk.

Topics:Incident

You Might Also Like