Your organization now requires a Software Bill of Materials (SBOM) for every release, and your CI/CD pipeline generates them automatically. However, a recent study of 25,882 Java SBOMs revealed that 7,907 failed to disclose direct dependencies, with 4.97% of those omissions containing known vulnerabilities.
This validation script helps your team catch noncompliant SBOMs before they leave your pipeline. It checks against NTIA minimum requirements and flags common compliance failures specific to Java projects.
Script Capabilities
This Python script validates Java SBOMs (CycloneDX or SPDX format) against three critical compliance checks:
- Direct Dependency Completeness — Compares your SBOM against actual JAR dependencies in your build artifacts.
- NTIA Minimum Elements — Verifies the presence of supplier name, component name, version, unique identifier, dependency relationships, SBOM author, and timestamp.
- Vulnerability Disclosure — Cross-references declared dependencies against known CVE databases.
The script outputs a compliance report with specific failures and remediation guidance. Run it in your CI pipeline after SBOM generation but before artifact publication.
Prerequisites
Before running this script, ensure you have:
- Python 3.9 or later
- Access to your build output directory (where compiled JARs reside)
- A generated SBOM file (CycloneDX JSON or SPDX JSON format)
- Network access to query the NVD API (or a local vulnerability database)
- Installed dependencies:
pip install cyclonedx-python-lib spdx-tools requests
Your SBOM generator must already be integrated into your build process. This script validates output—it doesn't create SBOMs. If you're using Maven, ensure cyclonedx-maven-plugin or spdx-maven-plugin runs before this validation step.
The Validation Script
#!/usr/bin/env python3
"""
SBOM Compliance Validator for Java Projects
Validates against NTIA minimum requirements and dependency completeness
"""
import json
import os
import sys
import zipfile
from pathlib import Path
from typing import Set, Dict, List
import requests
from datetime import datetime
class SBOMValidator:
def __init__(self, sbom_path: str, build_dir: str):
self.sbom_path = Path(sbom_path)
self.build_dir = Path(build_dir)
self.failures = []
self.warnings = []
def extract_jar_dependencies(self) -> Set[str]:
"""Extract actual dependencies from JAR manifests and pom.properties"""
dependencies = set()
for jar_file in self.build_dir.rglob("*.jar"):
try:
with zipfile.ZipFile(jar_file, 'r') as jar:
# Check META-INF for Maven metadata
for name in jar.namelist():
if name.startswith('META-INF/maven/') and name.endswith('pom.properties'):
props = jar.read(name).decode('utf-8')
# Parse groupId:artifactId:version
group_id = self._extract_property(props, 'groupId')
artifact_id = self._extract_property(props, 'artifactId')
version = self._extract_property(props, 'version')
if all([group_id, artifact_id, version]):
dependencies.add(f"{group_id}:{artifact_id}:{version}")
except Exception as e:
self.warnings.append(f"Could not parse {jar_file.name}: {str(e)}")
return dependencies
def _extract_property(self, props: str, key: str) -> str:
"""Extract property value from properties file content"""
for line in props.split('\n'):
if line.startswith(f"{key}="):
return line.split('=', 1)[1].strip()
return ""
def load_sbom(self) -> Dict:
"""Load and parse SBOM file"""
with open(self.sbom_path, 'r') as f:
sbom = json.load(f)
# Detect format
if 'bomFormat' in sbom and sbom['bomFormat'] == 'CycloneDX':
return self._parse_cyclonedx(sbom)
elif 'spdxVersion' in sbom:
return self._parse_spdx(sbom)
else:
raise ValueError("Unknown SBOM format. Expected CycloneDX or SPDX.")
def _parse_cyclonedx(self, sbom: Dict) -> Dict:
"""Extract relevant fields from CycloneDX SBOM"""
components = sbom.get('components', [])
dependencies = set()
for comp in components:
group = comp.get('group', '')
name = comp.get('name', '')
version = comp.get('version', '')
if group and name and version:
dependencies.add(f"{group}:{name}:{version}")
return {
'format': 'CycloneDX',
'dependencies': dependencies,
'metadata': sbom.get('metadata', {}),
'timestamp': sbom.get('metadata', {}).get('timestamp'),
'author': sbom.get('metadata', {}).get('authors', [])
}
def _parse_spdx(self, sbom: Dict) -> Dict:
"""Extract relevant fields from SPDX SBOM"""
packages = sbom.get('packages', [])
dependencies = set()
for pkg in packages:
name = pkg.get('name', '')
version = pkg.get('versionInfo', '')
# SPDX often uses name format: group:artifact
if ':' in name and version:
dependencies.add(f"{name}:{version}")
return {
'format': 'SPDX',
'dependencies': dependencies,
'metadata': sbom,
'timestamp': sbom.get('creationInfo', {}).get('created'),
'author': sbom.get('creationInfo', {}).get('creators', [])
}
def validate_ntia_minimum(self, sbom_data: Dict) -> bool:
"""Check NTIA minimum elements"""
valid = True
# Check timestamp
if not sbom_data.get('timestamp'):
self.failures.append("NTIA: Missing SBOM timestamp")
valid = False
# Check author
if not sbom_data.get('author'):
self.failures.append("NTIA: Missing SBOM author/creator")
valid = False
# Check dependencies have required fields
if not sbom_data.get('dependencies'):
self.failures.append("NTIA: No components declared")
valid = False
return valid
def validate_completeness(self, sbom_data: Dict) -> bool:
"""Compare SBOM against actual JAR dependencies"""
actual_deps = self.extract_jar_dependencies()
declared_deps = sbom_data['dependencies']
missing = actual_deps - declared_deps
if missing:
self.failures.append(
f"Completeness: {len(missing)} direct dependencies not disclosed in SBOM"
)
for dep in sorted(missing)[:10]: # Show first 10
self.failures.append(f" Missing: {dep}")
if len(missing) > 10:
self.failures.append(f" ... and {len(missing) - 10} more")
return False
return True
def check_vulnerabilities(self, sbom_data: Dict) -> List[str]:
"""Check declared dependencies against NVD"""
vulnerable = []
# NOTE: This is a simplified check. Production version should:
# - Use local CVE database or commercial feed
# - Implement rate limiting for NVD API
# - Cache results
for dep in sbom_data['dependencies']:
# Parse component
parts = dep.split(':')
if len(parts) < 3:
continue
# This is where you'd query your vulnerability database
# For this template, we'll just flag the structure
self.warnings.append(
f"Vulnerability check needed for: {dep} "
"(integrate with your CVE database or OSV.dev API)"
)
return vulnerable
def run_validation(self) -> bool:
"""Execute all validation checks"""
print(f"Validating SBOM: {self.sbom_path}")
print(f"Against build artifacts in: {self.build_dir}\n")
try:
sbom_data = self.load_sbom()
print(f"✓ Loaded {sbom_data['format']} SBOM")
print(f"✓ Found {len(sbom_data['dependencies'])} declared dependencies\n")
# Run checks
ntia_valid = self.validate_ntia_minimum(sbom_data)
complete = self.validate_completeness(sbom_data)
# Report results
if self.failures:
print("VALIDATION FAILURES:")
for failure in self.failures:
print(f" ✗ {failure}")
print()
if self.warnings:
print("WARNINGS:")
for warning in self.warnings:
print(f" ⚠ {warning}")
print()
if not self.failures:
print("✓ SBOM validation passed")
return True
else:
print(f"✗ SBOM validation failed with {len(self.failures)} error(s)")
return False
except Exception as e:
print(f"✗ Validation error: {str(e)}")
return False
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: validate_sbom.py <sbom_file> <build_directory>")
sys.exit(1)
validator = SBOMValidator(sys.argv[1], sys.argv[2])
success = validator.run_validation()
sys.exit(0 if success else 1)
Customization Options
For your CI/CD pipeline, add this step after SBOM generation:
# GitLab CI example
validate-sbom:
stage: verify
script:
- python3 validate_sbom.py target/bom.json target/
artifacts:
when: always
reports:
junit: sbom-validation-report.xml
For vulnerability checking, replace the placeholder in check_vulnerabilities() with your actual CVE data source. Options include:
- OSV.dev API for open-source vulnerability data
- Sonatype OSS Index (requires registration)
- Your commercial vulnerability scanner's API
- Local mirror of NVD data feeds
For custom compliance rules, add methods to the SBOMValidator class. For example, if your organization requires license information:
def validate_licenses(self, sbom_data: Dict) -> bool:
"""Check that all components declare licenses"""
# Implementation depends on your SBOM format
pass
For different build tools, modify extract_jar_dependencies():
- Gradle: Parse
build/libs/*.jarand check forMETA-INF/gradle/ - Ant: Look for Ivy metadata or custom manifest entries
- Manual builds: Point to your dependency directory structure
Validation Steps
Run this validation on every build:
Local Testing — Run against a known-good SBOM first:
python3 validate_sbom.py samples/good-sbom.json samples/build/Pipeline Integration — Add to your CI configuration. The script exits with code 1 on failure, which will fail the build.
Failure Triage — When validation fails, check the output:
- "Missing direct dependencies" → Your SBOM generator isn't scanning deeply enough. Check its configuration for dependency scope settings.
- "NTIA: Missing author" → Add metadata to your SBOM generator configuration.
- "Completeness" failures → Verify your build directory path is correct and contains all compiled artifacts.
Baseline Establishment — Run against your last 10 releases. If you find historical noncompliance, document it and set a remediation timeline. Don't block current releases for past issues.
Quarterly Audit — Re-validate published SBOMs from production. Dependencies change post-build in some environments. Catch drift early.
The study that analyzed 25,882 Java SBOMs found thousands of noncompliant artifacts already in production. Your validation script won't fix existing problems, but it prevents new ones from shipping. Start running it today—before your next release becomes part of that statistic.



