Fix issues with parsing and SARIF generation for SpaceROS #419

Ronoman · 2022-11-01T00:23:08Z

Before, the rules field in the SARIF output was polluted with many duplicate rules, as the full error message was included. Now, only the rule id that clang-tidy outputs between brackets (i.e. [google-explicit-constructor]) is taken and stored in rules.
Before, artifact paths sometimes had part or all of the error message in them. No more!
Before, the startLine and startColumn fields were strings. These are now integers.
Before, the result message had a lot more information than was needed. Now, it is just the error message (no location data or rule id).

This was tested on rclcpp and rcutils. ament_clang_tidy ran successfully on both, and outputted valid SARIF. However, I did have to change some of the RegEx's that were parsing clang_tidy output. I don't have a great way to check if I'm truly capturing all of the valid output of clang_tidy, so this may be unintentionally hiding some violations.

nuclearsandwich

Initial review is positive! A few comments and observations.

ament_clang_tidy/ament_clang_tidy/main.py

Ronoman · 2022-11-08T19:56:40Z

It seems like I made a poor assumption with some of the output, but I'm not sure how to resolve it. Most clang_tidy errors look like this:

/home/spaceros-user/src/spaceros/src/rcutils/src/char_array.c:153:14: warning: Call to function 'vsnprintf' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'vsnprintf_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
  int size = vsnprintf(char_array->buffer, char_array->buffer_capacity, format, args_clone);
             ^

These are matched and groups are extracted properly with the regex as it stands. However, some output looks like this:

/home/spaceros-user/src/spaceros/src/rcutils/src/char_array.c:123:5: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11

Here, the rule ID is missing from the result. In most cases, these are preceded by the same exact error, like this block:

/home/spaceros-user/src/spaceros/src/rcutils/src/array_list.c:175:5: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
    memcpy(dst_ptr, src_ptr, array_list->impl->data_size * copy_count);
    ^
/home/spaceros-user/src/spaceros/src/rcutils/src/array_list.c:175:5: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11

In these cases, it's safe to throw away the second result since it is duplicating one already listed (note that both results are pointing to array_list.c:175:5). However, this isn't always the case. It seems like clang_tidy does not add the rule ID when it is reporting note: results, as opposed to warning: or error: results. How can we capture this different clang_tidy output properly in SARIF?

ament_clang_tidy/ament_clang_tidy/main.py

nuclearsandwich · 2022-11-09T21:39:28Z

In these cases, it's safe to throw away the second result since it is duplicating one already listed (note that both results are pointing to array_list.c:175:5). However, this isn't always the case. It seems like clang_tidy does not add the rule ID when it is reporting note: results, as opposed to warning: or error: results. How can we capture this different clang_tidy output properly in SARIF?

It doesn't look like clang-tidy itself supports formatting its output nor does its list of checks state what level a check is is (note, warning, or error). The ruleID is optional right? Can we just omit it for notes?

Ronoman · 2022-11-09T23:28:28Z

It doesn't look like clang-tidy itself supports formatting its output nor does its list of checks state what level a check is is (note, warning, or error). The ruleID is optional right? Can we just omit it for notes?

According to The Spec, it looks like we can omit result.ruleId in some cases. The relevant sentence (I think) is this:

If theDescriptor does not exist (that is, if theTool does not contain a reportingDescriptor object (§3.49) that describes the rule that was violated), then rule SHALL NOT be present.

theTool probably does have the relevant rule ID, but since the result for notes doesn't give it to us, I think we're okay to omit it. In most of the cases I've seen with clang_tidy, the result description field is usually sufficient to understand what the violation is, and what kind of action needs to be taken to resolve it.

Signed-off-by: Eli Benevedes <[email protected]> Signed-off-by: Eli Benevedes <[email protected]>

Co-authored-by: Steven! Ragnarök <[email protected]> Signed-off-by: Eli Benevedes <[email protected]>

Signed-off-by: Eli Benevedes <[email protected]> Signed-off-by: Eli Benevedes <[email protected]>

Ronoman · 2022-11-19T00:54:10Z

Resolved all outstanding comments in 5d5e2ed. However, a new slight issue arises...

I've updated the regex rules to capture all clang_tidy results, even one that don't end in a [rule description]. However, this means that process_sarif is no longer going to de-duplicate results in clang_tidy properly. On line 106 of sarif_helpers, we check if the tuple (threeple?) (ruleId, artifact, region) has been seen before. With these changes, there are now results in clang_tidy that are exact duplicates, but some are missing the ruleId, so this check is no longer sufficient.

Is it okay to reduce the tuple to (artifact, region)? This would mean that any results that are pointing to the same file, with the exact same region (line number, sometimes column number) will be considered identical. I haven't yet observed two unique results pointing to the exact same artifact and region, so this seems like a reasonable assumption, but not always guaranteed to be true.

mjeronimo · 2022-11-22T18:44:25Z

Perhaps it would be more accurate to keep the (ruleId, artifact, region) and only make the reduction to (artifact, region) if necessary. In other words, ruleId would be optional in the comparison function, but used if present in both items to be compared.

ivanperez-keera · 2023-10-03T19:30:21Z

I'm following up on this (we have a PR in Space ROS that depends on this).

Does it still make sense to do this? If so, what's missing?

EzraBrooks · 2023-11-09T01:59:28Z

clang-tidy can be configured to output a YAML-formatted "fixes file" which is probably the way I'd recommend to ingest tidy output programmatically.

nuclearsandwich requested review from mjeronimo, nuclearsandwich and audrow November 8, 2022 17:47

nuclearsandwich reviewed Nov 8, 2022

View reviewed changes

ament_clang_tidy/ament_clang_tidy/main.py Show resolved Hide resolved

mjeronimo approved these changes Nov 8, 2022

View reviewed changes

ament_clang_tidy/ament_clang_tidy/main.py Show resolved Hide resolved

Eli Benevedes and others added 4 commits November 19, 2022 00:47

Fixed a few issues with parsing and SARIF gen.

2976b06

Signed-off-by: Eli Benevedes <[email protected]> Signed-off-by: Eli Benevedes <[email protected]>

Cleanup debug.

8703010

Signed-off-by: Eli Benevedes <[email protected]> Signed-off-by: Eli Benevedes <[email protected]>

Update RegEx comment

16d5e4a

Co-authored-by: Steven! Ragnarök <[email protected]> Signed-off-by: Eli Benevedes <[email protected]>

Allow clang_tidy results with no ruleId.

5d5e2ed

Signed-off-by: Eli Benevedes <[email protected]> Signed-off-by: Eli Benevedes <[email protected]>

Ronoman force-pushed the spaceros-clang-tidy-fixes branch from aabbfdb to 5d5e2ed Compare November 19, 2022 00:47

Ronoman mentioned this pull request Nov 19, 2022

Allow for missing ruleId. space-ros/process_sarif#9

Merged

clalancette assigned mjeronimo Dec 15, 2022

mjeronimo changed the title ~~Fixed a few issues with parsing and SARIF gen for SpaceROS~~ Fix issues with parsing and SARIF generation for SpaceROS May 16, 2023

Bckempa mentioned this pull request Dec 18, 2023

Allow for missing ruleId space-ros/process_sarif#13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issues with parsing and SARIF generation for SpaceROS #419

Fix issues with parsing and SARIF generation for SpaceROS #419

Ronoman commented Nov 1, 2022

nuclearsandwich left a comment

Ronoman commented Nov 8, 2022 •

edited

Loading

nuclearsandwich commented Nov 9, 2022

Ronoman commented Nov 9, 2022 •

edited

Loading

Ronoman commented Nov 19, 2022 •

edited

Loading

mjeronimo commented Nov 22, 2022 •

edited

Loading

ivanperez-keera commented Oct 3, 2023

EzraBrooks commented Nov 9, 2023

Fix issues with parsing and SARIF generation for SpaceROS #419

Are you sure you want to change the base?

Fix issues with parsing and SARIF generation for SpaceROS #419

Conversation

Ronoman commented Nov 1, 2022

nuclearsandwich left a comment

Choose a reason for hiding this comment

Ronoman commented Nov 8, 2022 • edited Loading

nuclearsandwich commented Nov 9, 2022

Ronoman commented Nov 9, 2022 • edited Loading

Ronoman commented Nov 19, 2022 • edited Loading

mjeronimo commented Nov 22, 2022 • edited Loading

ivanperez-keera commented Oct 3, 2023

EzraBrooks commented Nov 9, 2023

Ronoman commented Nov 8, 2022 •

edited

Loading

Ronoman commented Nov 9, 2022 •

edited

Loading

Ronoman commented Nov 19, 2022 •

edited

Loading

mjeronimo commented Nov 22, 2022 •

edited

Loading