-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fortify SCA SARIF inaccuracy causing poor GitHub Code Scanning experience #72
Comments
@simon-engledew, thanks for reporting this. I'm aware that the output isn't strictly following the expectation of what kind of data is supplied in the Primary cause is not necessarily in FortifyVulnerabilityExporter itself, but in how Fortify SSC/FoD expose vulnerability data in the REST API that is used by this integration. At the time when I originally developed this, generating a separate rule for every finding seemed like the best solution and didn't seem to have any significant impact on user experience. Can you give some specific examples / screenshots as to how the current approach is negatively impacting user experience? Based on that, I can research whether there may be better ways to represent rules and results. However, again this is dependent on data provided by the SSC/FoD REST API. I don't think current APIs return generic, issue-agnostic Fortify category descriptions, so if we are to generate issue-agnostic |
Thanks for the explanation! That totally tracks with what I am seeing. 👍 For context I work at GitHub, hi! 👋 😅 – we have various internal limits to stop Code Scanning tools breaking UI elements / internal endpoints and Fortify SCA is tripping them as it generates so many more rules than anything else 😬 😂. Things don't look too broken but we are discarding rules to keep pages loading. 🙇 In the short term, if you could drop fullDescription/help/properties from rules and put some or all of the information in the results message field that would avoid creating so many unique rules. ❤️ |
Thanks for the additional info. So, if you're discarding rules, I guess users may not see the bottom block (tool/rule-id and description) from the screenshot below if we report many issues? At what point does GitHub start discarding rules? If I reduce/de-duplicate the rules by removing issue-specific information, obviously rule id's for existing issues will change. Will this have any impact on issues/rules reported by previous runs of this utility on an existing repository, other than having a different layout? As in the example below, the text that's currently displayed in the rule block may be fairly long, so I don't think it would provide good user experience if we move all this information to the results message field, as that's shown inline with code snippets. Does the text block that shows the results message field also show a 'Show More' link if the text is too long, or will it always show the full text? If there's no text to be shown in the rule block, will GitHub still show that block? I think it would look a bit weird to show an empty rule block. |
Funnily enough that's not actually a problem! 😂 It's more the ancillary things like API calls and rule selector drop-downs - we also have some internal indexing that's returning truncated results.
IIRC we have an internal limit of 1,000. Fortify SCA is on about 300,600. 😂
The rule ID does need to line up with the previous analysis in order for the alert to be classified as the same, yeah. 😞
It will show the block with the text "No rule help". Not ideal. 😞 The main problem is the large number of unique SARIF identifiers. 😱 Ultimately it is our responsibility to make this work regardless of what SARIF we receive but at some point we may have to start cleaning up data or redacting rules. 🤔 It sounds like there isn't much you can do given the output the tool gives you? |
@simon-engledew, thanks again for the info. Given that number of 300,600, I guess this means that you're storing rules across repositories rather than per-repository? I was expecting that both I've been looking in more detail at the data that's being returned by our product REST endpoints:
Ideally I'd like to keep SARIF output for both integrations the same though. Even if we were able to generate issue-agnostic SARIF rules, there's also a lot of issue-specific information that we're currently showing. I'm not sure whether it's viable to move all this information to the results message field, given that this is shown inline with the code on GitHub. Do you have any suggestions as to how to best handle this? Side note, I'm migrating this functionality to another multi-purpose CLI tool and eventually FortifyVulnerabilityExporter will be deprecated. So, if I were to implement any changes based on this issue, it will probably go into the other tool: |
Yeah, it's a lot of data so we have to deduplicate the storage of rule text.
No, sorry; I think fundamentally SARIF is quite tailored to the way CodeQL wants to send results. 😞 Your best bet is to take a look at how CodeQL structures its rules and messages and try to mimic that as much as possible. 😬 Here is an example of CodeQL's rule help for an integer overflow issue: https://github.com/github/codeql/blob/a9bab18804e28159c81f6ab7b083df53b58f367f/go/ql/src/Security/CWE-681/IncorrectIntegerConversionQuery.qhelp#L5 That ends up with this nice (fairly general) help document that is true for all occurrences of an alert result: and a fairly short result message:
That's helpful context, thanks. ❤ |
Thanks again for the info. In theory, we should be able to generate similar issue-agnostic rule descriptions, again similar to what's for example shown here. As mentioned, the problem is that the products that FortifyVulnerabilityExporter integrates with don't expose these issue-agnostic descriptions (not at all for SSC, somewhat difficult to extract for FoD). I've already raised this concern to the respective product managers, but even if they agree to implement features to support this use case, it will probably take considerable amount of time before this becomes available. As an alternative, I'll investigate whether it's possible to get the issue-agnostic rule descriptions from another source like VulnCat (or the data behind VulnCat), but currently VulnCat doesn't offer any API for this purpose. Until then, I only see two options; leave things as they are, or use empty rule descriptions (at least for the SSC integration). The latter would however have a significant impact on user experience. I'll keep you posted on any progress. Edit: side question, how does GitHub handle changes to rule descriptions over time? Does GitHub potentially store multiple descriptions for the same rule id (showing the description that matches the contents of the SARIF file for a particular repository), or will it only store the first/last submitted rule description? |
We store the changes over time. 👍 |
Hi @simon-engledew, I'm still researching options to improve our SARIF output, and have one additional question for now. Our issue-specific rule descriptions currently include an issue-specific link back to our portal, allowing users to view additional issue details for an individual issue. If we were to generate issue-agnostic rules (as they're meant to be 😉), where should this link go? Will GitHub properly render any links in Thanks, |
@simon-engledew, thanks! According to https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#result-object, only |
I think the SARIF spec is right so it's quite likely I've either a) made a mistake or b) we support it as a happy accident. 😅 Looking through a big old folder of example SARIF files I don't see any that use That said, I definitely do see the link being rendered when I send some markdown so maybe both things are true. 😂 |
@simon-engledew Not sure I understand, do you mean that |
Sure! I've modified a CodeQL SARIF file and added a link into `results[].message.text`.{
"$schema" : "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version" : "2.1.0",
"runs" : [ {
"tool" : {
"driver" : {
"name" : "CodeQL",
"organization" : "GitHub",
"semanticVersion" : "2.0.0",
"rules" : [ {
"id" : "js/unused-local-variable",
"name" : "js/unused-local-variable",
"shortDescription" : {
"text" : "Unused variable, import, function or class"
},
"fullDescription" : {
"text" : "Unused variables, imports, functions or classes may be a symptom of a bug and should be examined carefully."
},
"defaultConfiguration" : {
"level": "note"
},
"properties" : {
"tags" : [ "maintainability" ],
"kind" : "problem",
"precision" : "very-high",
"name" : "Unused variable, import, function or class",
"description" : "Unused variables, imports, functions or classes may be a symptom of a bug\n and should be examined carefully.",
"id" : "js/unused-local-variable",
"problem.severity" : "recommendation"
}
}]
}
},
"results" : [ {
"ruleId" : "js/unused-local-variable",
"ruleIndex" : 0,
"message" : {
"text" : "Unused variable [foo](https://github.com/)."
},
"locations" : [ {
"physicalLocation" : {
"artifactLocation" : {
"uri" : "main.js",
"uriBaseId" : "%SRCROOT%",
"index" : 0
},
"region" : {
"startLine" : 2,
"startColumn" : 7,
"endColumn" : 10
}
}
} ],
"partialFingerprints" : {
"primaryLocationLineHash" : "39fa2ee980eb94b0:1",
"primaryLocationStartColumnFingerprint" : "4"
}
}],
"newlineSequences" : [ "\r\n", "\n", "
", "
" ],
"columnKind" : "utf16CodeUnits",
"properties" : {
"semmle.formatSpecifier" : "sarif-latest"
}
} ]
} Specifically: "message" : {
"text" : "Unused variable [foo](https://github.com/)."
}, I've uploaded it to a test repository and I can use the link: |
Great, thanks! According to the SARIF specification, |
Hi @simon-engledew, can you please review the SARIF files in this repository (and associated GitHub Code Scanning alerts if you have access) to verify that these meet GitHub expectations? Note that different Fortify rules may generate the same types of results, so there may be some duplication in rule descriptions. For example, the As mentioned before, I won't be updating SARIF output for FortifyVulnerabilityExporter as we plan on deprecating this tool (and proper fix would be difficult to implement based on current tool architecture). We'll advise customers to migrate to fcli once we release similar export functionality. For reference, the SARIF files referenced above were generated using an fcli development version, based on these yaml files: Side question 1: GitHub previously allowed importing a maximum of 1000 results (and would throw an error if a SARIF file contained more than 1000 results), but it looks like more results are now supported: https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#validating-your-sarif-file. Can you please confirm? Side question 2: You mentioned that rule descriptions get deduplicated across all repositories on GitHub. Based on some quick tests, I assume you deduplicate based on description hashes, and not based on other SARIF properties like rule id, correct? Or, in other words, using a SARIF file like this one, it wouldn't be possible to 'hijack' rule descriptions (with same rule id and tool properties) in other repositories, correct? |
Sorry for taking so long to get back to you! 🙇 😓
I can already see that you only use 45 rules for ~800 alerts so that looks like a big improvement 😂 I guess the question is, do you generate the same rule between multiple uploads? It looks like there have been 77 uploads (2 of which have alerts) and 45 unique rules across 818 alerts, but there is no overlap at all between the two analyses. I would expect that both analysis should share the same rules if they are describing the same alert. (This might just be because it's test data though). Looks like your 'more information' link points to localhost? I figure you already know that but I thought I would mention it. 😁
Yup! Looks like the default limit is currently
Yes 👍 |
Hi @simon-engledew, reviving this old thread. As discussed above, current Fortify integration now generates a single SARIF rule for every Fortify rule id (instead of generating a separate SARIF rule for every individual vulnerability), so this should allow GitHub to properly deduplicate the SARIF rules that we generate. Note though that some customers may still be using the old approach, especially since we're still waiting for GitHub to merge actions/starter-workflows#2588. Based on a customer request, we're however planning to make a small change to how we generate the SARIF rules, and would like to get your opinion on how this would affect deduplication on the GitHub side. Vulnerabilities produced by a single Fortify rule might have different severities, based on accuracy and likelihood. Therefore, with the current implementation, we can't include severity data in the SARIF rules. As such, I'm planning to change SARIF rule id to Can you please comment on any impact that this might have on the GitHub side? |
👋 Hello! I'm pretty sure if you set a severity on the result it should override the severity of the rule so that shouldn't be necessary. 😅 Example
(This also works for security severity when you set a |
Hi @simon-engledew, thanks for the quick response. From what I found in the SARIF specification and related GitHub docs, the only severity-related property that can be set on result entries is In the old implementation (with 1 rule per vulnerability), we set the In the worst case, this means that we'd generate 4 SARIF rules for every individual Fortify rule. In reality, I think many rules will stay within the same severity range, for example producing only 'Low' or maybe producing 'Medium' and 'Low' findings only. So, we may be producing only 1 or 2 SARIF rules for many of the individual Fortify rules. Anyway, unless you can think of a better approach, I'd like to understand what impact this may have on the GitHub deduplication algorithm, and/or whether this change could potentially cause issues on the GitHub side. If you're deduplicating based on only the Of course, in any case, we'd just produce up to 4 times more |
You definitely wont be able to override tags or precision, but overriding security severity is fine. I'm surprised it's not documented. 🤔 Here is a full example: Example
Specifically, you are should add a security-severity float to your
If you are having to override tags or precision then something is off - I would really encourage that you have only one rule for each problem you are looking for. 😅 Tags and precision should not be used for things that are specific to results. In this case, it sounds like you could represent your findings by overriding both security-severity and severity level in order to have a single rule serve all your results. If you really need to call out the precision of the result you'd probably have to do it in the message. |
Thanks, just tested and setting security-severity in result properties seems to work fine. Would be good to have this documented at https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#result-object I think this is sufficient for us/the customer, so we can stick to generating a single SARIF rule for each Fortify rule. Added advantage is that rule id's can then stay the same, such that existing users are not affected by GitHub closing all existing issues and generating new issues due to all issues having different rule id's. |
👋 Hello! Not sure if this is the right place to raise this issue, but we've noticed that the way Fortify SCA is generating SARIF documents is causing a bad user experience with GitHub Code Scanning.
Code Scanning expects that rule metadata will be shared many times between different runs of a Code Scanning tool. A rule should represent a capability of the tool, not information about any specific finding. Information that is scan-specific should be included in the results message field instead (e.g: file paths, container checksums etc).
Fortify SCA appears to be generating large numbers of rules, each one with unique alert specific information in the help text. Possibly due to the configuration in these files?
The text was updated successfully, but these errors were encountered: