Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with parsing and SARIF generation for SpaceROS #419

Open
wants to merge 4 commits into
base: spaceros
Choose a base branch
from

Conversation

Ronoman
Copy link

@Ronoman Ronoman commented Nov 1, 2022

  1. Before, the rules field in the SARIF output was polluted with many duplicate rules, as the full error message was included. Now, only the rule id that clang-tidy outputs between brackets (i.e. [google-explicit-constructor]) is taken and stored in rules.
  2. Before, artifact paths sometimes had part or all of the error message in them. No more!
  3. Before, the startLine and startColumn fields were strings. These are now integers.
  4. Before, the result message had a lot more information than was needed. Now, it is just the error message (no location data or rule id).

This was tested on rclcpp and rcutils. ament_clang_tidy ran successfully on both, and outputted valid SARIF. However, I did have to change some of the RegEx's that were parsing clang_tidy output. I don't have a great way to check if I'm truly capturing all of the valid output of clang_tidy, so this may be unintentionally hiding some violations.

Copy link
Contributor

@nuclearsandwich nuclearsandwich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review is positive! A few comments and observations.

ament_clang_tidy/ament_clang_tidy/main.py Show resolved Hide resolved
ament_clang_tidy/ament_clang_tidy/main.py Outdated Show resolved Hide resolved
ament_clang_tidy/ament_clang_tidy/main.py Outdated Show resolved Hide resolved
ament_clang_tidy/ament_clang_tidy/main.py Outdated Show resolved Hide resolved
ament_clang_tidy/ament_clang_tidy/main.py Show resolved Hide resolved
@Ronoman
Copy link
Author

Ronoman commented Nov 8, 2022

It seems like I made a poor assumption with some of the output, but I'm not sure how to resolve it. Most clang_tidy errors look like this:

/home/spaceros-user/src/spaceros/src/rcutils/src/char_array.c:153:14: warning: Call to function 'vsnprintf' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'vsnprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
  int size = vsnprintf(char_array->buffer, char_array->buffer_capacity, format, args_clone);
             ^

These are matched and groups are extracted properly with the regex as it stands. However, some output looks like this:

/home/spaceros-user/src/spaceros/src/rcutils/src/char_array.c:123:5: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11

Here, the rule ID is missing from the result. In most cases, these are preceded by the same exact error, like this block:

/home/spaceros-user/src/spaceros/src/rcutils/src/array_list.c:175:5: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
    memcpy(dst_ptr, src_ptr, array_list->impl->data_size * copy_count);
    ^
/home/spaceros-user/src/spaceros/src/rcutils/src/array_list.c:175:5: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11

In these cases, it's safe to throw away the second result since it is duplicating one already listed (note that both results are pointing to array_list.c:175:5). However, this isn't always the case. It seems like clang_tidy does not add the rule ID when it is reporting note: results, as opposed to warning: or error: results. How can we capture this different clang_tidy output properly in SARIF?

@nuclearsandwich
Copy link
Contributor

In these cases, it's safe to throw away the second result since it is duplicating one already listed (note that both results are pointing to array_list.c:175:5). However, this isn't always the case. It seems like clang_tidy does not add the rule ID when it is reporting note: results, as opposed to warning: or error: results. How can we capture this different clang_tidy output properly in SARIF?

It doesn't look like clang-tidy itself supports formatting its output nor does its list of checks state what level a check is is (note, warning, or error). The ruleID is optional right? Can we just omit it for notes?

@Ronoman
Copy link
Author

Ronoman commented Nov 9, 2022

It doesn't look like clang-tidy itself supports formatting its output nor does its list of checks state what level a check is is (note, warning, or error). The ruleID is optional right? Can we just omit it for notes?

According to The Spec, it looks like we can omit result.ruleId in some cases. The relevant sentence (I think) is this:

If theDescriptor does not exist (that is, if theTool does not contain a reportingDescriptor object (§3.49) that describes the rule that was violated), then rule SHALL NOT be present.

theTool probably does have the relevant rule ID, but since the result for notes doesn't give it to us, I think we're okay to omit it. In most of the cases I've seen with clang_tidy, the result description field is usually sufficient to understand what the violation is, and what kind of action needs to be taken to resolve it.

Eli Benevedes and others added 4 commits November 19, 2022 00:47
Signed-off-by: Eli Benevedes <[email protected]>

Signed-off-by: Eli Benevedes <[email protected]>
Signed-off-by: Eli Benevedes <[email protected]>

Signed-off-by: Eli Benevedes <[email protected]>
Co-authored-by: Steven! Ragnarök <[email protected]>
Signed-off-by: Eli Benevedes <[email protected]>
Signed-off-by: Eli Benevedes <[email protected]>

Signed-off-by: Eli Benevedes <[email protected]>
@Ronoman
Copy link
Author

Ronoman commented Nov 19, 2022

Resolved all outstanding comments in 5d5e2ed. However, a new slight issue arises...

I've updated the regex rules to capture all clang_tidy results, even one that don't end in a [rule description]. However, this means that process_sarif is no longer going to de-duplicate results in clang_tidy properly. On line 106 of sarif_helpers, we check if the tuple (threeple?) (ruleId, artifact, region) has been seen before. With these changes, there are now results in clang_tidy that are exact duplicates, but some are missing the ruleId, so this check is no longer sufficient.

Is it okay to reduce the tuple to (artifact, region)? This would mean that any results that are pointing to the same file, with the exact same region (line number, sometimes column number) will be considered identical. I haven't yet observed two unique results pointing to the exact same artifact and region, so this seems like a reasonable assumption, but not always guaranteed to be true.

@mjeronimo
Copy link
Contributor

mjeronimo commented Nov 22, 2022

Perhaps it would be more accurate to keep the (ruleId, artifact, region) and only make the reduction to (artifact, region) if necessary. In other words, ruleId would be optional in the comparison function, but used if present in both items to be compared.

@mjeronimo mjeronimo changed the title Fixed a few issues with parsing and SARIF gen for SpaceROS Fix issues with parsing and SARIF generation for SpaceROS May 16, 2023
@ivanperez-keera
Copy link

I'm following up on this (we have a PR in Space ROS that depends on this).

Does it still make sense to do this? If so, what's missing?

@EzraBrooks
Copy link

clang-tidy can be configured to output a YAML-formatted "fixes file" which is probably the way I'd recommend to ingest tidy output programmatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants