Skip to main content
Your AI Coding Agent Isn't Slow Because It's DumbGeneral
4 min readFor Developers

Your AI Coding Agent Isn't Slow Because It's Dumb

The conventional wisdom: When AI coding agents don't boost productivity, teams often think they need a smarter model. They consider waiting for GPT-5, switching to Claude Opus, or fine-tuning on proprietary codebases. Engineering managers often delay AI adoption until models "better understand our architecture."

This is backwards.

The real issue: The problem isn't the model's intelligence—it's the lack of a robust feedback infrastructure. You're asking an AI to write production code with the same verification tools you'd give an intern: manual code review and maybe a linter. Then you wonder why it can't work autonomously.

The real limitation is what happens after the agent generates code. Can it run your test suite? Can it deploy to a staging environment? Can it validate its changes against your API contracts? Can it measure the performance impact? Most implementations answer "no" to these questions, then blame the model for being unproductive.

The evidence: OpenAI used a team of three engineers to generate a working product with millions of lines of code. Stripe's Minions produce over a thousand merged pull requests every week. These aren't different models—they're different feedback architectures.

The pattern is consistent: teams that build comprehensive verification environments see AI agents operate with genuine autonomy. Teams that don't see expensive assistants that require constant human supervision.

Building Comprehensive Feedback: Consider what "comprehensive feedback" means. When Stripe's agents generate code, they can:

  • Execute the full test suite and interpret failures
  • Deploy to isolated environments and verify behavior
  • Run security scanners and triage findings
  • Measure performance characteristics
  • Validate against API specifications

Each of these feedback loops lets the agent iterate independently. Without them, every verification step requires a human, reducing your "AI agent" to just a code generator with extra steps.

This is crucial for compliance teams because AI-generated code doesn't get a special exemption from your security requirements. If you're subject to PCI DSS v4.0.1 Requirement 6.2.4 (addressing common vulnerabilities in bespoke software), the code still needs security review whether a human or an agent wrote it. An agent with access to your SAST tools, dependency scanners, and security test suite can perform that initial review itself. It can iterate on findings before human review, meaning your security team reviews higher-quality code.

What to do instead: Build the verification harness before scaling AI agent usage. This involves creating an environment that lets agents validate their own work.

Test execution: Your agent needs read/write access to your test infrastructure. Not just "can it run pytest"—can it interpret test failures, understand coverage reports, and add tests for uncovered paths? If your current setup requires a human to parse test output and decide what to fix, your agent will too.

Security validation: Connect your agents to the same security tools your humans use. SAST scanners, dependency checkers, container scanners—but with programmatic access and structured output. An agent that can run Semgrep, parse the SARIF output, and iterate on findings is doing actual security work. One that just generates code and waits for your security team to scan it is creating work.

Deployment verification: If your agent can't deploy its changes to an isolated environment and verify they work, you're missing the most valuable feedback loop. This is where agents catch integration issues, performance regressions, and behavioral changes that don't show up in unit tests.

Compliance documentation: For SOC 2 Type II or ISO 27001:2022 audits, you need evidence that security controls were applied. An agent that can document its security validation steps—"ran SAST, found 3 issues, remediated 3 issues, re-scanned clean"—is generating audit evidence automatically.

The implementation pattern is straightforward: every manual verification step in your development workflow should have a programmatic equivalent that your agent can invoke. If you can't script it, your agent can't use it.

This doesn't mean building everything from scratch. You already have CI/CD pipelines, security scanners, and test frameworks. The work is exposing them through interfaces that agents can use autonomously—APIs instead of dashboards, structured output instead of human-readable reports, programmatic access instead of manual triggers.

When model capability matters: Model capability does matter, but not where teams think it does. Better models help with architectural understanding, complex refactoring, and navigating large codebases. If your agent struggles to understand your system's design or makes architecturally inappropriate changes, that's a model limitation.

The distinction: if your agent generates reasonable code but can't iterate to production quality without constant human intervention, that's a feedback problem. If it generates code that shows fundamental misunderstanding of your system's architecture, that's a model problem.

You'll also hit model limits when working with proprietary frameworks, internal DSLs, or highly specialized domains. Fine-tuning or retrieval-augmented generation can help here. But even the best model can't overcome a verification environment that requires human interpretation at every step.

The path forward isn't waiting for smarter models. It's building the infrastructure that lets your current models work autonomously. When your agents can validate their own work, iterate on failures, and produce production-ready code without human handholding, you'll see the productivity gains that make AI coding agents worth the investment.

AI in Software Development

Topics:General

You Might Also Like