Skip to main content
AI Agent Skills Passed Every Security Scan—Then Turned MaliciousGeneral
5 min readFor Security Engineers

AI Agent Skills Passed Every Security Scan—Then Turned Malicious

Security scanners gave it a clean bill of health. GitHub stars suggested community trust. The skill looked legitimate by every conventional measure. Then it changed—after installation—and roughly 26,000 AI agents were already running it.

This isn't a theoretical attack. AIR demonstrated how flawed our current validation approach is for AI agent skills, with implications far beyond their specific experiment. If you're responsible for security controls in an environment where AI agents operate, you're likely relying on assumptions that don't hold.

Here's what actually protects you—and what doesn't.

Myth 1: Security Scanners Validate Skill Safety

The Reality: Static analysis tools evaluate code at a single point in time. They can't predict what that code will do after you install it.

AIR's fake skill passed every security scanner they tested—not because the scanners were poorly configured, but because the skill was genuinely benign at scan time. The malicious behavior only appeared after the skill was deployed and received updated instructions from its remote endpoint.

This is the fundamental flaw in treating AI agent skills like traditional software packages. A Python library doesn't rewrite its own behavior based on API calls. An AI agent skill can—and often must—pull dynamic content to function. Your scanner sees version 1.0 at rest. Your production environment runs version 1.0 plus whatever it fetches at runtime.

What you need instead: Implement runtime behavior monitoring that tracks what skills do after installation. Look for unexpected network calls, privilege escalations, or data access patterns that don't match the skill's stated purpose. This requires instrumentation at the agent execution layer, not just at the package distribution layer.

Myth 2: GitHub Stars Indicate Security Vetting

The Reality: Social proof metrics measure popularity, not security review. They're easily manipulated and tell you nothing about code safety.

The experiment leveraged GitHub stars as a trust signal because developers and security teams often treat them as meaningful. A repository with 500 stars feels more legitimate than one with 12. But stars can be purchased, generated by bot networks, or accumulated through marketing rather than technical merit.

More importantly, even legitimate stars don't represent security audits. Those 500 developers who starred a repository might have evaluated its functionality, its documentation quality, or its API design. Almost none of them performed a security review. You're trusting crowd sentiment about features, not crowd verification of safety.

What you need instead: Obtain explicit security attestations from sources you control. If your organization requires PCI DSS v4.0.1 compliance, your validation process should include documented evidence that a skill doesn't violate Requirement 6.2.4 (secure coding practices) or Requirement 6.4.3 (script execution controls). Stars don't give you that. Internal security reviews or third-party audits do.

Myth 3: Initial Validation Provides Ongoing Protection

The Reality: Skills that pass validation today can become malicious tomorrow through content updates, dependency changes, or compromised remote endpoints.

This is where the AIR experiment becomes particularly instructive. They didn't exploit a vulnerability in the scanning process itself. They exploited the time gap between validation and execution. The skill was safe when scanned. It became unsafe when it pulled new instructions from its control server.

Traditional software supply chain security assumes relatively stable artifacts. You validate a container image, sign it, and deploy it. The bits don't change. AI agent skills operate under different assumptions—they're designed to be dynamic, to incorporate new knowledge, to adapt their behavior based on external signals. That's the feature, not the bug.

What you need instead: Implement continuous validation that treats each execution as a new security event. This means monitoring outbound connections, tracking which APIs the skill calls, and flagging behavior changes from baseline. If a skill that previously only accessed local files suddenly starts making HTTP requests to unknown domains, your controls should catch that—even if the initial package passed all your scans.

Myth 4: Skill Marketplaces Enforce Security Standards

The Reality: Marketplace approval processes optimize for functionality and user experience, not comprehensive security review.

When a skill marketplace reviews submissions, they're checking for basic functionality, policy compliance (no obvious malware, no prohibited content), and user experience quality. They're not performing penetration testing. They're not analyzing every code path for injection vulnerabilities. They're not validating that the skill's runtime behavior matches its stated capabilities.

The scale makes this impossible. If a marketplace processes hundreds of skill submissions weekly, each review might get 15-30 minutes of human attention. That's enough to verify the skill works and doesn't contain obvious malware signatures. It's not enough to evaluate dynamic behavior, analyze remote endpoint security, or assess the skill's posture against OWASP ASVS v4.0.3 verification requirements.

What you need instead: Develop your own validation layer that assumes marketplace approval means "probably not obviously malicious" rather than "definitely secure." Apply the same rigor you'd apply to any third-party code: sandboxed testing, network traffic analysis, privilege requirement review, and data access auditing.

Myth 5: Open Source Skills Are Safer Because "Many Eyes" Review Them

The Reality: Visibility doesn't guarantee review, and review doesn't guarantee security expertise.

The "many eyes" theory assumes that open source code receives continuous security scrutiny from a distributed community. In practice, most repositories have a handful of active contributors and many passive users. Those users might read the documentation or skim the main execution paths, but they're not systematically auditing for security issues.

Even when security-conscious developers do review code, they're often looking at the repository state, not the runtime behavior. They see the code that fetches remote content. They don't necessarily test what happens when that remote content is malicious or compromised.

What you need instead: Treat open source skills like any other third-party dependency. That means vulnerability scanning, yes, but also behavioral analysis. Run the skill in an isolated environment and monitor what it actually does. Check its network traffic. Review its file system access. Verify that its runtime behavior matches the capabilities described in its manifest.

What to Do Instead

Stop relying on validation theater. Start building controls that match the actual threat model:

Implement runtime monitoring. Deploy tooling that tracks skill behavior during execution, not just during installation. Flag unexpected network calls, privilege escalations, or data access patterns.

Require security attestations. If your compliance framework demands it, document how each skill meets specific requirements. "Passed marketplace review" isn't an attestation. "Verified against NIST 800-53 Rev 5 AC-6 (Least Privilege)" is.

Sandbox unknown skills. Run new skills in isolated environments with limited access to production data and systems. Monitor their behavior over multiple executions before granting broader access.

Version-pin your dependencies. If a skill pulls remote content, understand what that content is and how it can change. Consider proxying or caching remote resources so you control what the skill receives.

Build detection for behavior drift. Establish baselines for what each skill normally does, then alert when behavior deviates. A skill that suddenly starts accessing different APIs or making new types of network requests deserves investigation.

The 26,000 agents that installed AIR's fake skill weren't compromised because their operators were careless. They were compromised because the security model they relied on—scan once, trust forever—doesn't work for dynamic, remotely-updated code. Your controls need to match the actual architecture you're defending.

Topics:General

You Might Also Like