Skip to main content
Hijacked ML Models via Bucket SquattingIncident
4 min readFor Security Engineers

Hijacked ML Models via Bucket Squatting

A vulnerability in the Google Cloud Vertex AI SDK allowed attackers to replace legitimate machine learning models with malicious ones—without needing credentials. Discovered by Palo Alto Networks Unit 42 and reported through Google's Vulnerability Reward Program, this flaw exploited predictable Cloud Storage bucket names to execute a "Pickle in the Middle" attack.

Google addressed the issue in SDK version 1.148.0 by adding bucket ownership verification. However, this incident highlights a broader issue: predictable resource naming in cloud environments creates exploitable attack surfaces that can bypass traditional access controls.

What Happened

The vulnerability enabled attackers to hijack model uploads by creating a Cloud Storage bucket with a predictable name derived from the victim's project ID and region. When a victim uploaded a model through the Vertex AI SDK, the attacker's bucket intercepted the upload instead of a legitimate bucket.

The attack was possible because the SDK generated bucket names using a deterministic pattern based on publicly available information. An attacker who knew your project ID and deployment region could pre-create the bucket your SDK would use, then wait for your upload.

Once the malicious bucket received the model file, the attacker controlled the code executed in your ML pipeline. Since many ML frameworks use Python's pickle format for serialization, a poisoned model file could execute arbitrary code when loaded.

Timeline

  • Discovery phase: Palo Alto Networks Unit 42 identified the predictable naming pattern in the Vertex AI SDK and demonstrated the attack vector.
  • Disclosure: Researchers reported the vulnerability through Google's Vulnerability Reward Program.
  • Remediation: Google released version 1.148.0 of the SDK with bucket ownership verification, preventing the attack by confirming the bucket belongs to the expected project before use.
  • Current state: Organizations using SDK versions prior to 1.148.0 remain vulnerable. The fix requires updating the SDK—there is no server-side mitigation.

Which Controls Failed or Were Missing

  • Lack of resource ownership verification: The SDK accessed Cloud Storage buckets without verifying they belonged to the expected Google Cloud project, violating the principle of verifying trust boundaries.
  • Predictable resource naming: Deterministic naming patterns based on project metadata created a race condition. An attacker could predict and claim the bucket name before the legitimate user.
  • Missing namespace isolation: Cloud Storage bucket names exist in a global namespace. The SDK didn't enforce project-level isolation, allowing external actors to squat on names reserved for the project owner.
  • Insufficient input validation: The SDK trusted that a bucket matching the expected name pattern was correct, without additional verification of ownership or origin.

What the Relevant Standards Require

  • NIST 800-53 Rev 5 SC-12 requires using random or unpredictable values for security-relevant parameters. Predictable bucket names function as implicit authorization.
  • ISO/IEC 27001:2022 Annex A.8.24 mandates sufficient randomness in cryptographic controls. A bucket receiving your ML models is an information asset, and the SDK failed to properly identify it before use.
  • OWASP ASVS v4.0.3 Requirement 3.1.1 states that applications should not use predictable identifiers for security-sensitive operations.
  • PCI DSS v4.0.1 Requirement 6.2.4 requires addressing vulnerabilities in custom software. The SDK update to 1.148.0 is a security patch that should be prioritized.
  • SOC 2 Type II CC6.1 requires logical access security measures to protect against external threats, like an attacker intercepting your uploads.

Lessons and Action Items for Your Team

  • Audit your cloud resource naming conventions. Scan your Infrastructure as Code for predictable naming patterns in security-sensitive resources. If an attacker can predict a resource name and pre-create it, you have a squatting vulnerability.

    Action: Run a grep across your Terraform, CloudFormation, and Pulumi code for string interpolation in resource names. Flag any pattern that uses only project metadata without a random component.

  • Update the Vertex AI SDK to version 1.148.0 or later. Check your dependency manifest. If you're using a version below 1.148.0, you're vulnerable.

    Action: Update the SDK version and test your model deployment pipeline. The ownership verification added in 1.148.0 may reveal misconfigurations where you were using buckets you don't control.

  • Implement bucket ownership verification in your own code. If you're directly manipulating Cloud Storage buckets, verify ownership before writing sensitive data. Query the bucket's IAM policy or project metadata and confirm it matches your expected project ID.

    Action: Add a pre-flight check to your upload functions that validates bucket ownership. For AWS S3, use get_bucket_location() and verify the account ID. For Azure, check the storage account's resource group.

  • Add entropy to security-sensitive resource names. Append a random suffix to bucket names to make pre-creation attacks computationally infeasible.

    Action: Update your resource creation logic to append uuid.uuid4().hex[:16] or equivalent to bucket names. Store the full name in your configuration management system.

  • Review your ML pipeline for pickle deserialization. If you're loading models serialized with Python's pickle module, you're executing code from those files. A poisoned model can run arbitrary commands.

    Action: Use safer serialization formats like ONNX or SavedModel. If you must use pickle, implement signature verification—sign your models and validate signatures before loading.

  • Test your dependency update process. This vulnerability required updating a library. How long would it take your team to deploy an SDK update across all environments?

    Action: Run a fire drill. Pick a non-critical dependency and update it to the latest version. Measure time from decision to production deployment. If it takes more than 48 hours, identify the bottlenecks.

The Vertex AI SDK vulnerability demonstrates that cloud security isn't just about IAM policies and network controls. Predictable behavior in trusted libraries creates attack surfaces that bypass your perimeter defenses. Your security model must account for the possibility that the infrastructure you think you're using isn't actually yours.

Topics:Incident

You Might Also Like