What Happened
Cyera disclosed a critical heap out-of-bounds read vulnerability in Ollama, a popular open-source framework for running large language models locally. The flaw, tracked as CVE-2026-7482 with a CVSS score of 9.1, allows remote attackers to leak process memory by sending a crafted model file to an Ollama server. Over 300,000 servers running Ollama versions before 0.17.1 remain vulnerable.
The vulnerability exposes sensitive data in process memory at the time of exploitation, such as API keys, customer data, internal system paths, authentication tokens, or fragments of documents the model recently processed. Cyera also identified two unpatched vulnerabilities in Ollama's Windows update mechanism that could enable persistent code execution, though specific CVE identifiers and CVSS scores for these flaws were not disclosed.
Timeline
The source article does not provide specific disclosure or exploitation dates. Here's what we know:
- Vulnerability window: All Ollama versions before 0.17.1
- Current status: Patch available in version 0.17.1
- Exposure: 300,000+ servers globally affected
- Disclosure source: Cyera security research team
The absence of a detailed timeline suggests this may have been a coordinated disclosure. Organizations should assume threat actors have access to the technical details now that the vulnerability is public.
Which Controls Failed or Were Missing
Patch Management Process
The 300,000 exposed servers indicate organizations either lack automated update mechanisms for AI inference infrastructure or treat these systems as "set and forget" deployments. Many teams install Ollama for experimentation or proof-of-concept work, then leave it running without the same change management discipline applied to production services.
Input Validation at the Application Boundary
Ollama accepted and processed model files without sufficient bounds checking on memory operations. The heap out-of-bounds read suggests the parser trusted size fields or offsets within the model file format without validating them against actual buffer dimensions.
Network Segmentation
The "remote attacker" classification means these Ollama instances were network-accessible. AI inference servers processing sensitive data should sit behind authentication layers and network controls that prevent arbitrary file uploads from untrusted sources.
Memory Safety Architecture
The vulnerability exists because Ollama's core processing logic, likely written in a memory-unsafe language, performs direct memory operations without runtime bounds enforcement. Using languages like Rust or memory-safe wrappers could have prevented this class of bug.
Security Monitoring for AI Infrastructure
Most organizations lack detection rules for unusual activity on AI inference endpoints. An attacker probing for this vulnerability would send malformed model files — activity that should trigger alerts if you're monitoring file upload patterns and error rates on these services.
What the Relevant Standard Requires
PCI DSS v4.0.1 Requirement 6.3.2
"Security vulnerabilities are identified and addressed upon discovery" with specific testing requirements for custom and bespoke software. If your Ollama deployment processes payment card data or sits within the cardholder data environment, you must patch critical vulnerabilities (CVSS 9.1 qualifies) within 30 days of vendor patch availability.
ISO/IEC 27001:2022 Control 8.8 (Management of Technical Vulnerabilities)
Organizations must establish a process to identify technical vulnerabilities, evaluate associated risks, and take appropriate action. For internet-facing services with critical vulnerabilities, this means emergency patching outside normal change windows.
NIST 800-53 Rev 5 SI-2 (Flaw Remediation)
Requires organizations to install security-relevant software updates within defined timeframes based on risk. A CVSS 9.1 vulnerability affecting an internet-accessible service would typically trigger your organization's "critical" response timeline — often 72 hours or less.
SOC 2 Type II CC7.1 (System Monitoring)
Your controls must detect and respond to security events. If you're running Ollama in production and claiming SOC 2 compliance, auditors will ask how you detected this vulnerability, how quickly you patched, and what monitoring you have in place to detect exploitation attempts.
OWASP ASVS v4.0.3 Section 5.1.5
"Verify that memory is cleared when no longer required" — while this specific requirement addresses secure memory handling in custom code, the principle applies to third-party components. Your architecture should minimize the exposure window for sensitive data in process memory, especially in services that parse untrusted input.
Lessons and Action Items for Your Team
Immediate Actions (This Week)
Inventory every Ollama instance in your environment. Check container registries, developer workstations, cloud instances, and that "temporary" demo server someone spun up six months ago. Upgrade all instances to version 0.17.1 or later. If you cannot patch immediately, take the service offline or restrict network access to known-good source IPs only.
Review what data these Ollama instances can access. If a server processes customer documents, internal code, or authentication credentials, assume that data could have leaked and assess your incident response obligations. You may need to notify affected parties depending on your jurisdiction and the data classification.
This Month
Add AI inference infrastructure to your vulnerability management program. These systems need the same patch cadence as your web application tier. If you're using Ollama in production, subscribe to the project's security advisories and establish an owner responsible for updates.
Implement authentication and authorization for all Ollama endpoints. The default configuration accepts any model file from any source. At minimum, require API authentication and validate that uploaded models come from your organization's approved model registry.
Deploy network-level controls. Place Ollama servers behind a reverse proxy that logs all requests, enforces rate limiting, and blocks requests from unexpected geographic regions or IP ranges. Configure your SIEM to alert on unusual file upload patterns or error spikes.
This Quarter
Develop security requirements for AI infrastructure. Your existing security standards likely don't address model file validation, inference server hardening, or AI-specific attack vectors. Document baseline security controls for any service that loads and executes models.
For the Windows update mechanism vulnerabilities that remain unpatched, evaluate whether you need to run Ollama on Windows at all. Linux containers offer better isolation and more mature security tooling. If Windows is required, implement application control policies that prevent unauthorized executables from running, even if an attacker compromises the update process.
Conduct a tabletop exercise around AI infrastructure compromise. Walk through the scenario: an attacker exploits a memory leak in your inference server and extracts API keys. Who gets paged? What data might be exposed? What's your notification timeline? Test your runbooks before you need them in production.
The Ollama vulnerability demonstrates that AI infrastructure carries the same security risks as any other internet-facing service — and often receives less scrutiny. Treat these systems with the same rigor you apply to your web tier, or accept that your next incident report might start with "our AI model server leaked customer data."



