-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add relationships field to enhance metadata on ID relationships #133
Add relationships field to enhance metadata on ID relationships #133
Conversation
Signed-off-by: Josh Buker <[email protected]>
Hey Josh, we can't break the schema, and would need a very good reason to create a new major version. I'll ask some more clarifying questions in #53. |
@oliverchang This would be backwards compatible, as you could still have an array of strings. It's just that you can now also do an array of objects instead if you want. (anyOf) |
That said, the types I provided as examples definitely need refined and perhaps slimmed down to purely known use-cases. For now, duplicate (and perhaps alias to consolidate that?) are the only ones I know of from the GSD perspective. @oliverchang, @darakian, are there any that would be useful from the Google/GitHub perspective? I could see GHSA's including parent/child relationships for a root-cause vulnerability and the affected OSS packages, for example. EDIT: Oh, and the SIMILAR_TO is something we've seen with particular CVEs that say "this is NOT CVE-x CVE-y CVE-z" - @kurtseifried might know the example IDs off the top of his head. EDIT 2: There would need to be a RELATED or similar type for explicit backwards compatibility with the untyped string array. (i.e. if the string array is used, all entries are assumed to be of |
Ideally, if a v2 schema was possible:
|
Alternatively, can also add |
One comment: I think we should leave aliases and related as is. 1) backwards compatibility (assuming anyone uses it) and 2) it also nicely solves the case for which we have related data but don't know what the relationship is. In my mind it would be nice to classify the relationship so that every end user doesn't have to read the links and figure it out, e.g. answers like "downstream vendor that ships this" or "vuln2 was found because of an incomplete fix for this vuln" and so on. Thus a possible solution where URLs start in related and then graduate to relationships once we know what the relationship is, also we leave the URL in related so that older tooling that can't read relationships still get the data. |
What does it mean to be in a parent child relationship though? Is that package Beyond that how do we collect and validate this data? How do we use this data? I find it very difficult to know at advisory review time what relations may exist and I'm not sure what behavior change I would expect from someone receiving an advisory if it had relationship data. |
This is the primary use-case I was imagining. For example, with Log4Shell you could have a root vuln for Log4j itself, with child IDs for each library or service that consumes it (Minecraft, Ubiquity, every Java application on the planet that logs things, etc). This helps keep the root vulnerability minimal and relevant while still providing the valuable meta for each package affected.
That's a good question. With something like GSD, for example, this would be folks updating the ID over time as things are discovered, rather than something frontloaded with the original/parent vuln ID. The only behavior change I would see is some additional metadata for humans investigating the root cause. Scanners wouldn't care because it doesn't matter where the affected version comes from (one huge ID with everything everywhere affected, or from a child ID specific to the package scanned). |
In theory this is how CVEs are supposed to work. Rule 7.2.4 in the cve rule set states that a CNA
In practice it can be hard to know if two advisories share the same underlying vulnerable code or not which is where this gets painful.
I guess let me ask; how is a relation different from an alias? An alias is already a relation with a claim that two advisory IDs are |
An alias is a type of relationship; not all relationships are aliases. #53 has more examples of the other kinds of relationships and why they would be helpful. |
Signed-off-by: Josh Buker <[email protected]>
Signed-off-by: Josh Buker <[email protected]>
I do get wanting more fidelity out of relations, but looking at that issue it seems like an impossibility to maintain those with any sort of consistency. |
Signed-off-by: Josh Buker <[email protected]>
Signed-off-by: Josh Buker <[email protected]>
30e01b0
to
6432f8a
Compare
docs/schema.md
Outdated
### relationships[].type field | ||
|
||
Specifies the type of relationship this OSV has to the other identifier. Must | ||
include one of the following types: | ||
|
||
- `ALIAS`: An alias, or identifier that is referring to the exact same | ||
vulnerability. This is for connecting identifiers from different databases, | ||
and not for marking duplicate IDs within the same database, which should use | ||
`DUPLICATED_BY` or `DUPLICATE_OF` respectively. | ||
- `CAUSES`: Causes a related vulnerability, for example Log4Shell causing | ||
binaries that embed the vulnerable version of Log4j to be vulnerable to RCE. | ||
- `CAUSED_BY`: Caused by a related vulnerability, most often an embedded | ||
dependency. | ||
- `COMMON_NAME`: A name used to colloquially refer to a specific, usually high | ||
impact, vulnerability. For example, "Log4Shell" would be a common name for | ||
CVE-2021-44228. | ||
- `DUPLICATED_BY`: Other identifiers within the same database that are marked as | ||
a duplicate of this ID. | ||
- `DUPLICATE_OF`: Points to the canonical identifier for a vulnerability within | ||
a given database. | ||
- `INCOMPLETE_FIX_FOR`: When the remediation for a vulnerability is incomplete, | ||
and causes a related vulnerability. For example, Log4Shell (CVE-2021-44228) | ||
would be an incomplete fix for CVE-2021-45046. | ||
- `INSUFFICIENT_FIX_OF`: Fixes a vulnerability caused by a previous remediation | ||
being incomplete. For example, CVE-2021-45046 would be an insufficient fix of | ||
Log4Shell (CVE-2021-44228). | ||
- `RELATED`: An identifier that is related in an unspecified way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@darakian Updated this with some descriptions for each relationship type.
Could you expand a bit on the concerns around consistency?
Existing tooling could continue using aliases
/related
or switch to relationships
using just the ALIAS
/RELATED
types, so this would primarily expand what other databases would be able to do as far as data enrichment goes (one of the main use-cases for GSD).
Signed-off-by: Josh Buker <[email protected]>
Signed-off-by: Josh Buker <[email protected]>
Closing. I share the same concerns with @darakian here around this being not feasible to maintain with consistency in a practical way. This also complicates the schema a fair bit, and fragments it (i.e. there are now two ways to specify aliases). |
Fixes #53