What Happened
Between October 2025 and January 2026, GreyNoise captured 91,403 distinct attack sessions targeting publicly exposed LLM inference endpoints. The primary target was Ollama instances deployed without authentication. A subsequent scan identified 175,000 unique Ollama hosts accessible from the internet across 130 countries.
Attackers built automated tools to identify exposed endpoints, validate their accessibility, and either exfiltrate data from models or use the compute resources for their own inference workloads. The campaign, named "Operation Bizarre Bazaar," documented attackers monetizing access to these endpoints by reselling inference capacity or extracting proprietary model weights.
Timeline
October 2025: Attack sessions begin appearing in GreyNoise telemetry, targeting default Ollama ports (11434/TCP) and common reverse proxy paths.
November 2025: Attack volume increases as scanning tools proliferate. Attackers begin fingerprinting exposed endpoints to identify model types and capabilities.
December 2025: Operation Bizarre Bazaar investigation reveals organized monetization of compromised endpoints. Attackers establish persistent access to high-capability hosts.
January 2026: Global scan identifies 175,000 exposed Ollama instances. Attack sessions peak at over 30,000 per month.
Which Controls Failed or Were Missing
Authentication and Access Control
Ollama ships with no authentication enabled by default. Teams deploying it assumed the application would run on internal networks or behind existing access controls. Instead, instances were exposed directly to the internet or placed behind reverse proxies that didn't enforce authentication at the application layer.
The failure: No requirement for authentication before processing inference requests. An attacker who could reach the endpoint could submit arbitrary prompts and retrieve responses.
Network Segmentation
Exposed instances were reachable from any internet source. No firewall rules, no IP allowlisting, no VPN requirement. In many cases, teams deployed Ollama on cloud instances with security groups set to 0.0.0.0/0 for rapid testing, then never tightened the rules before moving to production workloads.
The failure: AI infrastructure treated as low-risk development tooling rather than systems processing potentially sensitive data.
Security Configuration Baselines
Default configurations persisted into production. No hardening guides were followed. No security reviews occurred before deployment. Teams prioritized speed of deployment over security posture, assuming AI tools carried the same risk profile as internal developer utilities.
The failure: No documented security baseline for AI infrastructure. No change control process requiring security review before internet exposure.
Monitoring and Detection
Organizations didn't monitor for unusual inference patterns, unexpected source IPs, or abnormal request volumes. The first indication of compromise was often a cloud bill showing unexpected compute usage, not a security alert.
The failure: No logging of inference requests. No anomaly detection on API usage patterns. No alerting on access from unexpected geographic regions.
What the Relevant Standards Require
ISO/IEC 27001:2022 Control 8.2 (Privileged Access Rights)
"Privileged access rights shall be allocated and managed to restrict and control access."
LLM endpoints capable of processing arbitrary prompts and returning model outputs constitute privileged access to AI capabilities. ISO 27001 requires you to control who can access these systems and under what conditions. Leaving inference APIs open to the internet violates this control entirely.
Your action: Implement authentication for all AI endpoints. Use API keys at minimum, OAuth 2.0 or mutual TLS for higher-risk deployments.
NIST CSF v2.0 PR.AC-3 (Remote Access)
"Remote access is managed to prevent unauthorized access."
The framework requires you to manage remote access through authentication, encryption, and monitoring. Every one of these Ollama instances accepted remote access from any source without authentication.
Your action: Deploy AI infrastructure behind VPNs or zero-trust network access controls. If you must expose endpoints externally, require strong authentication and log every access attempt.
PCI DSS v4.0.1 Requirement 1.4.2
"Inbound traffic from untrusted networks to trusted networks is restricted."
If your AI systems process or have access to cardholder data environments, this requirement mandates firewall rules that restrict inbound traffic. Exposing an LLM endpoint directly to the internet without access controls fails this requirement.
Your action: Define network security zones for AI infrastructure. Implement firewall rules that deny by default and permit only necessary traffic from authorized sources.
OWASP ASVS v4.0.3 Requirement 4.1.1
"Verify that the application enforces access control rules on a trusted service layer."
Even if you deploy a reverse proxy or API gateway in front of your LLM, the application layer must enforce access control. Relying solely on network controls creates a single point of failure.
Your action: Configure application-level authentication in your LLM deployment. Don't assume network controls are sufficient.
Lessons and Action Items for Your Team
Immediate Actions (This Week)
Scan your external attack surface for exposed AI endpoints. Search for Ollama, vLLM, LocalAI, or custom inference servers listening on public IPs. Tools like Shodan and Censys can help, but start with your own asset inventory.
Enable authentication on every LLM endpoint. If the product doesn't support built-in auth, deploy it behind an authenticating reverse proxy. Generate API keys, distribute them to authorized users, and rotate them quarterly.
Review firewall rules for AI infrastructure. Remove 0.0.0.0/0 rules. Implement IP allowlisting for known source ranges. Require VPN access for internal teams.
Short-Term Actions (This Month)
Implement request logging for all inference APIs. Capture timestamp, source IP, authenticated user, prompt length, and response status. Send logs to your SIEM.
Deploy rate limiting. Set per-user and per-IP thresholds for inference requests. Alert on violations. This won't stop determined attackers but will slow automated exploitation.
Create security configuration baselines for AI deployments. Document required settings for authentication, network exposure, logging, and monitoring. Require security review before any AI system accepts production traffic.
Long-Term Actions (This Quarter)
Build anomaly detection for AI usage. Alert on requests from new geographic regions, unusual request volumes, or prompts containing patterns associated with data exfiltration attempts.
Implement model access controls. Not every user needs access to every model. Define roles and permissions that restrict which models each API key can access.
Conduct tabletop exercises for AI incidents. Walk through scenarios where an attacker gains access to your inference endpoints. Test your detection, response, and recovery procedures.
The 175,000 exposed Ollama hosts represent organizations that deployed AI infrastructure without applying the same security rigor they use for databases, APIs, or web applications. Your LLM endpoints process data and consume compute resources. Treat them accordingly.
Shodan Censys



