Skip to main content
AI Flagged 27 Bugs. Only 7 Got Fixed.Incident
5 min readFor Security Engineers

AI Flagged 27 Bugs. Only 7 Got Fixed.

What Happened

The DARPA Artificial Intelligence Cyber Challenge (AIxCC) concluded with a $30,500,000 prize pool, showcasing AI's potential in vulnerability detection. Seven teams deployed AI systems against open source projects. Ada Logics verified 27 real-world vulnerabilities from the AI-generated findings.

The outcome? Twenty bugs remain unpatched. This wasn't due to negligence but because the volume and presentation of AI findings overwhelmed already-stretched open source teams.

This isn't about AI failure. It's about a process failure that your security program needs to address before deploying AI vulnerability scanners at scale.

Timeline

Competition Phase: AI systems scanned open source codebases, generating vulnerability reports across multiple projects. Teams competed to identify the most critical issues using autonomous AI approaches.

Verification Phase: Ada Logics manually triaged the AI-generated reports. Of hundreds of potential findings, they confirmed 27 legitimate vulnerabilities requiring remediation.

Integration Phase: OpenSSF coordinated with project maintainers to patch the confirmed issues. Seven vulnerabilities were successfully fixed and deployed. Twenty remained in backlog or were deprioritized due to maintainer capacity constraints.

Post-Competition: The gap between "AI found it" and "maintainer fixed it" became the central lesson—not the detection capability itself.

Which Controls Failed or Were Missing

Vulnerability Management Process

The competition revealed a critical gap in PCI DSS v4.0.1 Requirement 6.3.1: "Security vulnerabilities are identified and addressed." Finding vulnerabilities is only half the control. The standard requires both identification and remediation within defined timeframes based on risk ranking.

The AI systems excelled at identification. The process failed at risk ranking, communication, and remediation tracking. Maintainers received findings without:

  • Clear severity scoring aligned to their threat model
  • Exploitability context specific to their deployment patterns
  • Remediation guidance integrated into their existing workflow
  • Deduplication against known issues or accepted risks

Change Management Integration

NIST 800-53 Rev 5 Control CM-3 requires documented change control processes. When AI generates vulnerability reports faster than your change management system can process them, you create a backlog that obscures genuine critical issues.

The AIxCC findings arrived as a batch. Projects lacked the process to:

  • Automatically create tracked work items from AI reports
  • Assign ownership based on component architecture
  • Schedule remediation within sprint planning cycles
  • Verify fixes without re-running the entire AI scan

Resource Allocation

ISO 27001 Control 5.1 requires management to allocate adequate resources for information security. Open source maintainers typically operate with zero dedicated security budget and limited volunteer hours.

Twenty confirmed vulnerabilities went unpatched not because maintainers didn't care—because they lacked the time, expertise, or organizational support to act on valid findings. This is the same resource constraint your security team faces when you deploy AI scanners without expanding remediation capacity.

What the Relevant Standard Requires

PCI DSS v4.0.1 Requirement 6.3.2 mandates that you maintain an inventory of security vulnerabilities and assess them for risk. AI detection tools help build the inventory. You still need humans to:

  • Validate the finding applies to your deployment
  • Assess actual risk based on compensating controls
  • Prioritize against other security work
  • Track remediation to completion

NIST CSF v2.0 Function: Respond requires your incident response process to handle vulnerability disclosures efficiently. When AI generates findings at machine speed, your response process must scale proportionally. The AIxCC demonstrated what happens when detection outpaces response by 3:1.

SOC 2 Type II Common Criteria CC7.1 requires you to identify, analyze, and respond to risks. An AI tool that flags 100 potential issues with 27% accuracy (the AIxCC ratio) creates a new risk: alert fatigue that causes your team to miss the genuine critical findings.

Lessons and Action Items for Your Team

Before You Deploy AI Scanners

Map your current vulnerability backlog. If you already have 200 open findings in your tracking system, adding AI-generated reports will multiply your triage burden. Close or accept existing issues first. Target: backlog under 50 items before introducing AI tools.

Define severity thresholds. Decide which CVSS scores or risk ratings require immediate action versus scheduled remediation. Document this in your vulnerability management policy. The standard: Critical within 15 days, High within 30 days per PCI DSS v4.0.1 Requirement 6.3.1.

Assign triage ownership. Designate one person to review all AI-generated findings before they reach developers. Ada Logics performed this role in AIxCC. You need the equivalent function internally.

When You Receive AI Findings

Batch and deduplicate. AI tools often report the same vulnerability pattern across multiple files. Group related findings into a single work item. Your developers need "fix the input validation pattern in the authentication module," not "fix line 47, fix line 89, fix line 132."

Add exploitation context. The AI reports a SQL injection vulnerability. You need to document: Is this endpoint internet-facing? Does it process sensitive data? Are there WAF rules that mitigate it? This context determines priority.

Integrate with existing workflows. Create tickets in your current system (Jira, GitHub Issues, Linear). Don't ask developers to check a separate AI dashboard. The AIxCC findings that got fixed were those that entered maintainer workflows seamlessly.

After Remediation

Verify the fix. Re-run the AI scan against the patched code. Confirm the vulnerability no longer appears. Seven of the AIxCC fixes were verified this way. The other 20 lack confirmation because the verification step wasn't systematized.

Document accepted risks. Some AI findings won't warrant fixes—low severity, compensating controls exist, or exploitation requires insider access. Document why you're not fixing them. This satisfies audit requirements and prevents the AI from re-flagging the same issue.

Measure your ratio. Track: AI findings received, findings confirmed valid, findings remediated, time to remediation. The AIxCC ratio was 27 confirmed from hundreds flagged, 7 fixed from 27 confirmed. If your ratios are similar, your process needs tuning—either the AI configuration or your triage criteria.

The AIxCC wasn't an AI failure. It was a process stress test. AI found real vulnerabilities. The gap between detection and remediation exposed the bottleneck in every security program: limited human capacity to triage, prioritize, and fix issues at machine-generated scale.

Your action item isn't "buy better AI." It's "build the process that turns AI findings into fixed code."

Topics:Incident

You Might Also Like