Why is PURL so important for software security?

Apr 02, 2026

Today, the great majority of cyberattacks exploit an unpatched vulnerability in a software product. While software vulnerabilities cannot be eliminated, they need to be managed so that the most serious risks are addressed. Successful vulnerability management starts with vulnerability identification:

The great majority of new vulnerabilities are reported by the supplier of the software product(s) that is affected by the vulnerability. While there are many formats used to report vulnerabilities, by far the most widely used format is CVE, which stands for Common Vulnerabilities and Exposures”. CVE reports are known as CVE records; they are distributed and maintained by the CVE Program.
The CVE Program is run by the MITRE Corporation under a contract with the US Cybersecurity and Infrastructure Security Agency (CISA). MITRE reports to a nonprofit board composed of representatives of federal agencies and private organizations. The program operates ten Working Groups that are open to the public, although some of them have restricted membership.
CVE records must follow the CVE Record Format (also known as the CVE Schema), although in fact many, or even most, records do not follow it perfectly. The CVE Record Format is updated and maintained by the CVE Quality Working Group.
New CVE records are created by CVE Numbering Authorities (CNA). As of March 30, 2026, there are 498 CNAs. Well-known CNAs include (in no particular order) Microsoft, Red Hat, GitHub, Oracle, HPE, Schneider Electric, Cisco, ENISA, JP-CERT, MITRE, and CISA.
If the developer of the vulnerable product is a CNA, they will create a new CVE record themselves. If the developer is not a CNA, they may request that a CNA, whose scope they fall under, create a new record. If a developer cannot find a CNA to create the record, they can approach one of the CNAs of Last Resort: Red Hat, MITRE or CISA.
New CVE records are added to the CVE.org database. As of March 30, 2026, that database contains 323,000 CVE records.
Vulnerability databases that support CVE download new CVE records from CVE.org and make them available for searches. Some vulnerability databases, including the National Vulnerability Database (NVD), enhance the records with additional information, although only the information added by the NVD is included in the CVE records themselves.
To identify vulnerabilities that affect software products they use, as well as learn about available patches for those vulnerabilities, end users and vulnerability service providers search vulnerability databases for CVEs that affect those products. They can then download the patches needed to protect the software products they use.

The CVE program was introduced in 1999. In that year, 300 CVE records were created. With so few records, it was easy for end users to manage vulnerabilities, even without access to automated tools. However, as mentioned earlier, there are now over 320,000 CVE records. Today, successful vulnerability management requires automation. And automation requires that both vulnerabilities and affected software products be clearly identifiable using machine readable identifiers.

Vulnerabilities are clearly identifiable using CVE numbers, as well as other vulnerability identifiers like GitHub Security Advisories (GHSA), Python Package Advisories (PyPA), and OSV[i]. However, the situation is very different when it comes to software identifiers.

At first glance, it might seem that identifying a software product is easy: simply a) use the name of the product, and b) since two vendors might sell a product with the same name, include the vendor name with the product name. However, software products are notorious for having different names at different stages of their lifecycle. For example:

1. A commercial product’s name may change when it is sold to a different vendor, or for marketing purposes.

2. The vendor of a commercial product, or the community that maintains an open source project, may refer to the same product with different names in different contexts.

3. The name of a commercial vendor may be stated in different ways, e.g. “Microsoft” vs. “Microsoft, Inc.”

4. The name of a commercial vendor may change because of a merger or acquisition.

5. A commercial software company is often referred to using many different names, e.g. “Microsoft, Inc.”, “Microsoft”, “Microsoft EMEA”, “Reversing Labs” (one of the thousands of companies that Microsoft has acquired), “Microsoft Productivity and Business Processes”, etc.

6. A vulnerability may only be found in a product when a certain option has been enabled, yet the name of the product is always the same, with or without the option.

One frequently suggested “solution” to this problem is to develop a massive database listing all regularly used names for a software product or a software developer, and link all of them to a single name for each product and each developer. However, there are many problems with that, including that a) developing, and especially maintaining, such a database would be fantastically expensive, and b) the employees or divisions of a large company seldom agree on a single name for the company. It’s useless for an outsider to try to decide that for them.

The best solution to this problem is to utilize a software identifier that doesn’t try to decide what is the “correct” name for a software product or for a software vendor. Instead, the identifier “delegates” responsibility for deciding the correct name to the organization that has the biggest responsibility for making sure a product’s name is clear and unambiguous.

In the case of “packaged” open source software, this is the operator of the package manager that distributes the software. In the case of commercial software, this is the organization that developed and/or sells the product. In both cases, that organization can be said to “control the namespace” for the product. A software identifier that delegates naming authority like this could be called a “distributed namespace”, as opposed to the centralized namespace described above.

Given that a centralized database of software names would be prohibitively expensive, how can a user searching a vulnerability database be certain that the name they search for is the definitive name for a particular product and version (hereinafter called “product/version”)? After all, the user wants to be sure that, by searching for the name of a software product, they will discover all vulnerabilities that have been identified as applicable to that product/version.

For this to happen, there needs to be a single machine-readable identifier for a vulnerable product/version. The CNA needs to include that identifier in the CVE record and the user needs to use it to search the vulnerability database. If all CNAs utilize a single identifier to refer to that product/version and a user searches the database using the same identifier, the user should be able to learn about all vulnerabilities that apply to that product/version.

How can we ensure that, barring an error, the CNA and the user will always utilize the same identifier to refer to a product/version? We can do that if the identifier is based on information that a) the supplier will always know before they report a vulnerability for their product, and that b) the user will know (or can easily discover) before they search for that product in a vulnerability database.

A good analogy for this is the case of the formula for a chemical compound. If a chemist has identified a compound whose molecules consist of two hydrogen atoms and one oxygen atom, the chemist will write it as “H2O” (of course, the “2” is normally written as a subscript). Every other chemist will recognize that as water. Similarly, a compound of one sodium and one chlorine atom is NaCl, which is table salt. Note that all chemists can create and interpret these identifiers without having to look them up in a central database. A chemist who reads “NaCl” always knows which compound that refers to.

There is a software identifier that works in the same way. It’s called “PURL”, which stands for “package URL”. It is in widespread use as an identifier in vulnerability databases for open source software that is made available for download through package managers (there is another class of open source software that isn’t available through package managers; C and C++ programs fall in this category).

To create a PURL for an open source product, the supplier or user only needs to know the package manager name (such as PyPI or Maven Central), the product name in that package manager, and the version number (usually called a “version string”). Because every product name/version string combination will always be unique within one package manager, the PURL that includes those three pieces of information is guaranteed to be unique (of course, the same product/version might be available in a different package manager, but it will always have a different PURL, since the package manager is different).

A PURL is also guaranteed always to point to the same codebase, since if the supplier of the product (which could be an open source community) maintains a strict policy of not making any changes to an existing version unless the version number changes, the combination of product name and version string will never change for that codebase.

For example, the PURL for version 1.11.1 of the package named “django” in the PyPI package manager is “pkg:pypi/django@1.11.1”. If a CNA is creating a CVE record for that package, they can verify the name and version string by downloading the package from PyPI. By the same token, when a user of django version1.11.1 wants to learn about vulnerabilities that have been identified in that package, they should already have access to the package manager name, package name and version string, since after all that’s where they obtained the software they’re using. Barring a mistake, the PURL they create to search for this package/version will always match the PURL the CNA used to identify the same package/version.

Besides PURL, the only other vulnerability identifier in widespread use is CPE, which stands for “Common Platform Enumeration”. Without going into a lot of detail, CPE was until October 2025 the only identifier used in the National Vulnerability Database. It was developed more than 20 years ago by the National Institute of Standards and Technology (NIST), which operates the NVD.

Until February 2024, CPE names were created by NIST contractors working for the NVD and added to new CVE records to identify the product(s) described in the text of those records. Unfortunately, there is no way that anyone can predict with certainty what CPE the NIST contractor created. Some of the reasons why this is the case are described on pages 4-6 of the OWASP SBOM Forum’s 2022 white paper titled “A proposal to operationalize component identification for vulnerability management”. Because of problems like these, CPE is an unreliable identifier.

There is an even more serious problem with CPE: Since February 2024, the NVD staff has drastically reduced the number of CPEs it creates. The result is that over half of new CVE records entered since February 2024 do not have a CPE name attached to them. This makes those CVEs invisible to automated searches using a CPE name. Thus, a user that searches the NVD today using a product name and version string will on average see fewer than half of CVEs that apply to that product/version.

Even worse, in January 2026, NIST announced that, because so many new CVE records are flooding in, they have to further reduce the number of CVE records they “enrich” (NIST’s term for adding a CPE name and other information like CVSS score to a record); the result will inevitably be that in the future, fewer than a quarter of new CVE records will be visible to product searches in the NVD.[ii] In other words, if a software product/version used by your organization has had a vulnerability reported against it so far this year, there is less than a 25% likelihood that a search for that product/version in the NVD will identify that vulnerability.

The upshot of this situation is that, if truly automated software vulnerability management is ever going to be possible, PURL will need to be the primary software identifier used in the CVE program. However, there is still one hurdle preventing truly widespread use of PURL: Currently, PURL cannot identify commercial (proprietary) software. Since most private and public sector organizations in the world rely primarily on commercial software to run their operations, this obstacle needs to be removed, so that users of commercial software products can easily learn about vulnerabilities present in those products by using PURL.

Fortunately, the outlines of a solution to this problem are becoming clear: The supplier of a commercial software product needs to make a small amount of information – primarily product name, version string and supplier URL – available to users of the product, as well as to any other parties wishing to learn about vulnerabilities that affect the product/version. The user can utilize this information to create a PURL for the product/version[iii], then search a vulnerability database to learn what vulnerabilities affect that product/version.

However, the decisions regarding how best to expand PURL to cover commercial software need to be made by people with a stake in the outcome, including software developers, end user organizations, vulnerability service providers, PURL community members, and anyone else interested in seeing this topic finally addressed.

OWASP has formed a PURL Expansions Working Group to address this problem; I and Steve Springett, Global Chairman of the OWASP Board and leader of the OWASP CycloneDX and Dependency Track projects, will co-lead this group. We would love to have you join us and/or support the project monetarily through donations to OWASP, a 501(C)(3) nonprofit corporation. Please email me to discuss this further.

[i] OSV publishes a schema that is utilized by about twenty open source communities, including GitHub and Python. Vulnerabilities published by those communities are made available in the OSV.dev database.

[ii] In the January announcement, NIST also stated that they will “transfer” the vulnerability enrichment work to the CNAs. However, since CPE has so many problems and is a fundamentally unreliable identifier, previous attempts to get the CNAs to add CPE names to their new CVE records have mostly failed. There is no reason to believe this attempt will not also fail.

[iii] The supplier should not only make available the three fields required to construct a PURL, but they should also construct the PURL itself and include it with the other information.

Tom Alrich's Blog, too

Discussion about this post

Ready for more?