Skip to main content
SBOM Validation Script for Java ProjectsResearch
6 min readFor Compliance Teams

SBOM Validation Script for Java Projects

Your organization now requires a Software Bill of Materials (SBOM) for every release, and your CI/CD pipeline generates them automatically. However, a recent study of 25,882 Java SBOMs revealed that 7,907 failed to disclose direct dependencies, with 4.97% of those omissions containing known vulnerabilities.

This validation script helps your team catch noncompliant SBOMs before they leave your pipeline. It checks against NTIA minimum requirements and flags common compliance failures specific to Java projects.

Script Capabilities

This Python script validates Java SBOMs (CycloneDX or SPDX format) against three critical compliance checks:

  1. Direct Dependency Completeness — Compares your SBOM against actual JAR dependencies in your build artifacts.
  2. NTIA Minimum Elements — Verifies the presence of supplier name, component name, version, unique identifier, dependency relationships, SBOM author, and timestamp.
  3. Vulnerability Disclosure — Cross-references declared dependencies against known CVE databases.

The script outputs a compliance report with specific failures and remediation guidance. Run it in your CI pipeline after SBOM generation but before artifact publication.

Prerequisites

Before running this script, ensure you have:

  • Python 3.9 or later
  • Access to your build output directory (where compiled JARs reside)
  • A generated SBOM file (CycloneDX JSON or SPDX JSON format)
  • Network access to query the NVD API (or a local vulnerability database)
  • Installed dependencies: pip install cyclonedx-python-lib spdx-tools requests

Your SBOM generator must already be integrated into your build process. This script validates output—it doesn't create SBOMs. If you're using Maven, ensure cyclonedx-maven-plugin or spdx-maven-plugin runs before this validation step.

The Validation Script

#!/usr/bin/env python3
"""
SBOM Compliance Validator for Java Projects
Validates against NTIA minimum requirements and dependency completeness
"""

import json
import os
import sys
import zipfile
from pathlib import Path
from typing import Set, Dict, List
import requests
from datetime import datetime

class SBOMValidator:
    def __init__(self, sbom_path: str, build_dir: str):
        self.sbom_path = Path(sbom_path)
        self.build_dir = Path(build_dir)
        self.failures = []
        self.warnings = []
        
    def extract_jar_dependencies(self) -> Set[str]:
        """Extract actual dependencies from JAR manifests and pom.properties"""
        dependencies = set()
        
        for jar_file in self.build_dir.rglob("*.jar"):
            try:
                with zipfile.ZipFile(jar_file, 'r') as jar:
                    # Check META-INF for Maven metadata
                    for name in jar.namelist():
                        if name.startswith('META-INF/maven/') and name.endswith('pom.properties'):
                            props = jar.read(name).decode('utf-8')
                            # Parse groupId:artifactId:version
                            group_id = self._extract_property(props, 'groupId')
                            artifact_id = self._extract_property(props, 'artifactId')
                            version = self._extract_property(props, 'version')
                            if all([group_id, artifact_id, version]):
                                dependencies.add(f"{group_id}:{artifact_id}:{version}")
            except Exception as e:
                self.warnings.append(f"Could not parse {jar_file.name}: {str(e)}")
                
        return dependencies
    
    def _extract_property(self, props: str, key: str) -> str:
        """Extract property value from properties file content"""
        for line in props.split('\n'):
            if line.startswith(f"{key}="):
                return line.split('=', 1)[1].strip()
        return ""
    
    def load_sbom(self) -> Dict:
        """Load and parse SBOM file"""
        with open(self.sbom_path, 'r') as f:
            sbom = json.load(f)
        
        # Detect format
        if 'bomFormat' in sbom and sbom['bomFormat'] == 'CycloneDX':
            return self._parse_cyclonedx(sbom)
        elif 'spdxVersion' in sbom:
            return self._parse_spdx(sbom)
        else:
            raise ValueError("Unknown SBOM format. Expected CycloneDX or SPDX.")
    
    def _parse_cyclonedx(self, sbom: Dict) -> Dict:
        """Extract relevant fields from CycloneDX SBOM"""
        components = sbom.get('components', [])
        dependencies = set()
        
        for comp in components:
            group = comp.get('group', '')
            name = comp.get('name', '')
            version = comp.get('version', '')
            if group and name and version:
                dependencies.add(f"{group}:{name}:{version}")
        
        return {
            'format': 'CycloneDX',
            'dependencies': dependencies,
            'metadata': sbom.get('metadata', {}),
            'timestamp': sbom.get('metadata', {}).get('timestamp'),
            'author': sbom.get('metadata', {}).get('authors', [])
        }
    
    def _parse_spdx(self, sbom: Dict) -> Dict:
        """Extract relevant fields from SPDX SBOM"""
        packages = sbom.get('packages', [])
        dependencies = set()
        
        for pkg in packages:
            name = pkg.get('name', '')
            version = pkg.get('versionInfo', '')
            # SPDX often uses name format: group:artifact
            if ':' in name and version:
                dependencies.add(f"{name}:{version}")
        
        return {
            'format': 'SPDX',
            'dependencies': dependencies,
            'metadata': sbom,
            'timestamp': sbom.get('creationInfo', {}).get('created'),
            'author': sbom.get('creationInfo', {}).get('creators', [])
        }
    
    def validate_ntia_minimum(self, sbom_data: Dict) -> bool:
        """Check NTIA minimum elements"""
        valid = True
        
        # Check timestamp
        if not sbom_data.get('timestamp'):
            self.failures.append("NTIA: Missing SBOM timestamp")
            valid = False
        
        # Check author
        if not sbom_data.get('author'):
            self.failures.append("NTIA: Missing SBOM author/creator")
            valid = False
        
        # Check dependencies have required fields
        if not sbom_data.get('dependencies'):
            self.failures.append("NTIA: No components declared")
            valid = False
        
        return valid
    
    def validate_completeness(self, sbom_data: Dict) -> bool:
        """Compare SBOM against actual JAR dependencies"""
        actual_deps = self.extract_jar_dependencies()
        declared_deps = sbom_data['dependencies']
        
        missing = actual_deps - declared_deps
        
        if missing:
            self.failures.append(
                f"Completeness: {len(missing)} direct dependencies not disclosed in SBOM"
            )
            for dep in sorted(missing)[:10]:  # Show first 10
                self.failures.append(f"  Missing: {dep}")
            if len(missing) > 10:
                self.failures.append(f"  ... and {len(missing) - 10} more")
            return False
        
        return True
    
    def check_vulnerabilities(self, sbom_data: Dict) -> List[str]:
        """Check declared dependencies against NVD"""
        vulnerable = []
        
        # NOTE: This is a simplified check. Production version should:
        # - Use local CVE database or commercial feed
        # - Implement rate limiting for NVD API
        # - Cache results
        
        for dep in sbom_data['dependencies']:
            # Parse component
            parts = dep.split(':')
            if len(parts) < 3:
                continue
            
            # This is where you'd query your vulnerability database
            # For this template, we'll just flag the structure
            self.warnings.append(
                f"Vulnerability check needed for: {dep} "
                "(integrate with your CVE database or OSV.dev API)"
            )
        
        return vulnerable
    
    def run_validation(self) -> bool:
        """Execute all validation checks"""
        print(f"Validating SBOM: {self.sbom_path}")
        print(f"Against build artifacts in: {self.build_dir}\n")
        
        try:
            sbom_data = self.load_sbom()
            print(f"✓ Loaded {sbom_data['format']} SBOM")
            print(f"✓ Found {len(sbom_data['dependencies'])} declared dependencies\n")
            
            # Run checks
            ntia_valid = self.validate_ntia_minimum(sbom_data)
            complete = self.validate_completeness(sbom_data)
            
            # Report results
            if self.failures:
                print("VALIDATION FAILURES:")
                for failure in self.failures:
                    print(f"  ✗ {failure}")
                print()
            
            if self.warnings:
                print("WARNINGS:")
                for warning in self.warnings:
                    print(f"  ⚠ {warning}")
                print()
            
            if not self.failures:
                print("✓ SBOM validation passed")
                return True
            else:
                print(f"✗ SBOM validation failed with {len(self.failures)} error(s)")
                return False
                
        except Exception as e:
            print(f"✗ Validation error: {str(e)}")
            return False

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: validate_sbom.py <sbom_file> <build_directory>")
        sys.exit(1)
    
    validator = SBOMValidator(sys.argv[1], sys.argv[2])
    success = validator.run_validation()
    sys.exit(0 if success else 1)

Customization Options

For your CI/CD pipeline, add this step after SBOM generation:

# GitLab CI example
validate-sbom:
  stage: verify
  script:
    - python3 validate_sbom.py target/bom.json target/
  artifacts:
    when: always
    reports:
      junit: sbom-validation-report.xml

For vulnerability checking, replace the placeholder in check_vulnerabilities() with your actual CVE data source. Options include:

  • OSV.dev API for open-source vulnerability data
  • Sonatype OSS Index (requires registration)
  • Your commercial vulnerability scanner's API
  • Local mirror of NVD data feeds

For custom compliance rules, add methods to the SBOMValidator class. For example, if your organization requires license information:

def validate_licenses(self, sbom_data: Dict) -> bool:
    """Check that all components declare licenses"""
    # Implementation depends on your SBOM format
    pass

For different build tools, modify extract_jar_dependencies():

  • Gradle: Parse build/libs/*.jar and check for META-INF/gradle/
  • Ant: Look for Ivy metadata or custom manifest entries
  • Manual builds: Point to your dependency directory structure

Validation Steps

Run this validation on every build:

  1. Local Testing — Run against a known-good SBOM first: python3 validate_sbom.py samples/good-sbom.json samples/build/

  2. Pipeline Integration — Add to your CI configuration. The script exits with code 1 on failure, which will fail the build.

  3. Failure Triage — When validation fails, check the output:

    • "Missing direct dependencies" → Your SBOM generator isn't scanning deeply enough. Check its configuration for dependency scope settings.
    • "NTIA: Missing author" → Add metadata to your SBOM generator configuration.
    • "Completeness" failures → Verify your build directory path is correct and contains all compiled artifacts.
  4. Baseline Establishment — Run against your last 10 releases. If you find historical noncompliance, document it and set a remediation timeline. Don't block current releases for past issues.

  5. Quarterly Audit — Re-validate published SBOMs from production. Dependencies change post-build in some environments. Catch drift early.

The study that analyzed 25,882 Java SBOMs found thousands of noncompliant artifacts already in production. Your validation script won't fix existing problems, but it prevents new ones from shipping. Start running it today—before your next release becomes part of that statistic.

Topics:Research

You Might Also Like