Software Bill of Materials: Understanding What You’re Actually Running
Software increasingly becomes more complicated. We regularly import libraries for complex or tedious tasks that we would rather not do ourselves to speed up the development of new applications or features. Database connectors, web application frameworks, serialization libraries. The list goes on for tools we need to remain highly productive. And, as we import libraries, they may import their own dependencies called transient dependencies. This creates additional bloat and additional risks to our applications and businesses.
To track what services and libraries are in use, a software bill of materials (SBOM) is a document designed to list the software utilized along with various characteristics of that software. Two common formats are OWASP’s CycloneDX and the Linux Foundation’s System Package Data Exchange (SPDX). Both formats are open source. CycloneDX is lighter weight while SPDX can be more verbose with information such as library licensing. A more comprehensive breakdown can be found here.
Once a list of materials is generated, we can then start to analyze those materials. An important step is to assess for publicly known vulnerabilities (e.x. CVEs) that are present. We can use tools such as Bomber to help us. Additionally, we can collect a list of software licenses to determine legal threats to the application’s business model.
Why Bother Making a Complete List?
While importing and using third-party libraries can make our development quicker and easier, they pose various risks to the security of our applications. Malicious updates and publicly known vulnerabilities are real risks to our information security.
A prime example of the risk present within applications and their dependencies is the 2017 Equifax breach. The primary vulnerability was a publicly known Remote Code Execution (RCE) vulnerability within Apache Struts, CVE-2017-5638. Utilizing this vulnerability and other weaknesses, the data of hundreds of millions of people was exfiltrated from Equifax servers.
Another old but prime example would be the left-pad incident in 2016. A widely used library was suddenly removed from the NPM registry. CI/CD pipelines very quickly failed as the imported package could not be found. Many people did not even know that they relied on the small library but were affected all the same. All of this was caused by a transient dependency - a dependency imported by other libraries.
A third example would be the 2021 Log4j vulnerability Log4Shell. A critical-risk vulnerability with the common Java logger was discovered, potentially enabling malicious parties to execute arbitrary code. It became a dire situation where teams needed to find every instance of the library used to ensure patches went out quickly.
Having the ability to maintain a living document of services and dependencies within your applications helps when assessing risks.
In May 2021, President Biden signed Executive Order 2021-10460 requesting guidance for critical software used by the federal government. Included within this order was a request of standards for incident response and monitoring. This outline should describe how a vendor can “[generate] and, when requested by a purchaser, [provide] artifacts that demonstrate conformance to the processes set forth,” requiring SBOMs of such critical software to be created and made available to the purchaser. The likely motivation for these requirements was “to quickly and easily determine whether [operators] are at potential risk of a newly discovered vulnerability.”
The Risks Least Talked About
In addition to cybersecurity risks, libraries may pose legal risks. Is the license a non-commercial Creative Commons variant? Is it GPL, and you want your source to remain proprietary? Is there a custom license restricting the number of users or employees, such as Docker Desktop on Mac? While I’m certainly not a lawyer and legal battles can be rare, licensing poses a different sort of risk and should not be taken lightly.
Generating an SBOM with Syft
Syft (https://github.com/anchore/syft) is a tool that generates SBOMs from container images or directories. We can output the SBOM in one of several styles including CycloneDX or SPDX in XML and JSON formats. We’ll generate a SPDX JSON file.
After installing Syft, we’ll grab Spring PetClinic
(https://github.com/spring-projects/spring-petclinic). This is a Spring example project. We will want to build with `mvn package` so that all dependencies are downloaded and setup for us to identify with Syft.
With the source pulled and the package built, we can now run Syft.
syft {project_dir} -o spdx-json > petclinic_spdx.json
will produce the JSON output and will write it to a JSON file. Inside, we’ll have a list of packages and versions, the files scanned, a list of licenses found, and the relationships between packages.
The contents of the file are generally straightforward. Packages contain versions and, if found, the license of the package. When navigating licenses or packages across the document, you may need to utilize a unique suffix Syft generates. Use this to navigate relationships or licensing when multiple versions of the same package are present.
Analyzing the SBOM with Bomber
With the Syft output, we can now scan with Bomber (https://github.com/devops-kung-fu/bomber). Bomber will look at the packages from our generated SBOM and it will create a list of publicly known vulnerabilities for each package. Bomber can output in two formats: JSON and HTML. The HTML output will give a human-friendly visual similar to OWASP’s Dependency Check while the JSON output will be a bit more “complete” (I’ll show why in a moment). Both outputs will include a summary of what was found.
When looking at the HTML output, each package will be listed individually, assuming CVEs were found, and each finding will be listed with a risk rating. A description of the findings is also included.
One thing to note: when no package version is specified, Bomber will create a list of all CVEs for that package. If your future projects have a large list of findings, be aware that many may not apply. Within the example above, the H2 version installed was 2.2.224, but the Pom.xml file did not set an explicit version. All reported CVEs were from 2.1.214 and prior. Erring on the side of caution is preferred to remaining silent of potential security issues. Just be aware that the report outputs should not be taken as 100% fact when versions are excluded.
I also stated that the JSON output is more “complete.” If you look at the findings above in the HTML document, they lack the associated CVE identifier. This may happen for individual findings with the HTML output. However, this is not an issue with the JSON output. With a little scripting or the use of a tool such as JQ, the CVE identifiers can be extracted quickly.
Conclusion
A software bill of materials can help identify various risks posed to your organization and software. From cybersecurity vulnerabilities to legal risks, these documents should be created and regularly updated to quickly assess these risks as time goes on. Open-source tools exist that make this easy to accomplish and, with a little JSON parsing, could be integrated into additional tooling and reporting.
Sean Lyford is a Senior Security Consultant with Cloud Security Partners. He has over 11 years of experience within the information security and development fields. Sean focuses on application and cloud security practices.
Sean has a career of both application security consulting and software engineering. As a software engineer, Sean has experience with high-level web applications, AI/ML integrations, and network application development. With his experience as a software engineer, Sean is able to effectively communicate with development teams and provide remediation guidance and prioritization.
In his free time, Sean spoils good walks (i.e., plays golf) and enjoys video games.