Skip to main content
300,000 Ollama Servers Leaking MemoryIncident
4 min readFor Security Engineers

300,000 Ollama Servers Leaking Memory

Three API calls. No authentication required. Full process memory exposed — including API keys, conversation history, and model weights.

CVE-2026-7482, now known as Bleeding Llama, turned approximately 300,000 internet-facing Ollama servers into open memory dumps. If your team runs Ollama for local LLM inference, your server may have been broadcasting everything in RAM to anyone who knew where to look.

What Happened

Researchers discovered that Ollama's API endpoints allowed unauthenticated attackers to extract the entire process memory of running servers through a three-step exploit:

  1. Call the model listing endpoint
  2. Request model details
  3. Trigger a memory extraction through a crafted API request

The vulnerability existed in Ollama versions prior to v0.17.1. A public proof-of-concept became available, making exploitation easy for anyone with basic scripting knowledge.

Timeline

The disclosure timeline reveals critical gaps in vulnerability management:

  • Initial discovery: Vulnerability identified by security researchers
  • Patch release: Ollama v0.17.1 shipped with a fix
  • CVE assignment: CVE-2026-7482 assigned post-patch
  • Public awareness: Release notes did not flag the security fix, delaying recognition of the issue's severity
  • Exposure window: Organizations running outdated versions remained vulnerable, with approximately 300,000 servers still exposed at the time of public disclosure

The delay between patch availability and clear security communication left a massive exposure window. Teams monitoring changelogs for security updates would have seen a routine version bump, not an urgent patch.

Which Controls Failed or Were Missing

Unauthenticated API access: Ollama's default configuration exposed API endpoints without authentication. Any network-accessible server accepted requests from any source.

Lack of input validation: The API accepted crafted requests that triggered unintended memory access, violating basic input sanitization principles.

Insecure defaults: Out-of-the-box deployment bound services to 0.0.0.0, making servers immediately internet-accessible without explicit configuration.

Inadequate security disclosure: The patch existed, but release notes buried the fix. No security advisory. No CVE reference in the changelog. Teams had no signal that v0.17.1 was anything more than a maintenance release.

Missing network segmentation: Organizations deployed Ollama servers directly on the internet rather than behind VPNs or internal networks, maximizing the attack surface.

What the Standards Require

OWASP ASVS v4.0.3 Requirement 4.1.1 mandates that applications verify all untrusted data and reject malformed input. The memory extraction exploit bypassed any meaningful input validation.

PCI DSS v4.0.1 Requirement 6.4.3 requires that security vulnerabilities are identified using reliable sources and that systems are protected from known vulnerabilities. The lack of clear security flagging in release notes directly undermines this requirement — how can you protect against a vulnerability you don't know exists?

ISO/IEC 27001:2022 Control 8.8 addresses management of technical vulnerabilities, requiring organizations to obtain timely information about vulnerabilities and evaluate exposure. When vendors don't clearly communicate security fixes, this control breaks down at the source.

NIST 800-53 Rev 5 Control AC-3 mandates access enforcement — every request should be authorized. Ollama's unauthenticated API design violated this fundamental principle.

SOC 2 Type II CC6.1 requires logical access controls, including authentication. Exposing memory extraction capabilities without authentication fails this criterion entirely.

The pattern here: multiple frameworks assume vendors will clearly communicate security issues. Ollama's release process failed that assumption.

Lessons and Action Items for Your Team

Audit your AI infrastructure now. If you run Ollama, verify you're on v0.17.1 or later. Check your network configuration — is your instance bound to 0.0.0.0? Can the internet reach it directly?

Default-deny network access for AI services. Ollama, vLLM, LocalAI, and similar tools should live behind VPNs or within internal networks. If you need external access, put them behind an authenticated reverse proxy. Configure your firewall rules before you start the service, not after.

Monitor CVE databases independently. Don't rely solely on vendor release notes. Set up automated alerts for CVEs mentioning your technology stack. Tools like CISA's Known Exploited Vulnerabilities catalog and GitHub's Dependabot can catch what changelogs miss. CISA Known Exploited Vulnerabilities

Implement authentication at the infrastructure layer. Even if the application doesn't require it, your reverse proxy should. Use mutual TLS, API keys, or OAuth tokens. Never expose unauthenticated APIs to untrusted networks.

Establish a rapid patch process for AI tools. Many organizations treat AI infrastructure as "experimental" and skip standard patch management. That ends now. Your LLM servers process sensitive data — they need the same patch SLAs as your production databases.

Validate your secure defaults. Review the default configurations of every AI tool in your stack. What ports do they bind to? What authentication do they require? What data do they log? Fix the defaults before deployment, because most teams never change them.

Create security-specific monitoring. Track unusual API patterns: high-volume requests to model endpoints, requests from unexpected geographic locations, or sequential calls matching known exploit patterns. AppTrana and similar WAF solutions can detect CVE-2026-7482 exploitation attempts, but you need monitoring in place first.

The Bleeding Llama vulnerability exposed a systemic problem: AI tooling moves faster than security practices. Ollama isn't unique — it's representative. Every open-source AI platform you deploy likely has similar gaps between security incident and clear communication.

Your move: treat AI infrastructure like production infrastructure. Authenticate everything. Segment your networks. Monitor your CVE feeds. And never trust a changelog that doesn't explicitly flag security fixes.

Topics:Incident

You Might Also Like