Mutation testing - pr differences #4178

ASuciuX · 2023-12-15T14:27:26Z

Applicable issues

This has to be merged before in order to have the references right. Please review this as well

Test/mutants pr differences actions#4

Additional info (benefits, drawbacks, caveats)

Solutions and Recommended Usage

Cargo-mutant does an incremental build for each mutation, that’s why this solution comes with the implementation of tracking the output and only updating it with the differences, to run 10-20, maybe a few 100s mutants instead of 20,000.

CI YML File: Are linked with the actions in the actions' repo stacks-network/actions#3.

Should be executed with every PR, as it tests only the updated or newly added functions in the PR's commits.

How to read the status from the `pr-differences` workflow

If it is not failing, it means all the functions are tested properly.
If it is failing it will display what type of tests are the problem: missing tests, timeout tests, or unviable tests. For each section, it will specify the functions, the files they are in and their line numbers.

Handling Exit Codes (docs)

Mutation testing produces exit codes post-completion.
In the pr-differences workflow they are caught and specified with extra details about what should be done based on them.

- dockerfile and shell script for specific packages - ci.yml for diff on packages on PR

Co-authored-by: jbencin <[email protected]>

- runs for modified files & created files - has to be run before committing the changes ``` cd mutation-testing/scripts sh git-diff.sh ```

codecov · 2023-12-15T14:41:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4b6638a) 82.43% compared to head (5a4f9b3) 82.59%.

❗ Current head 5a4f9b3 differs from pull request most recent head c16d87c. Consider uploading reports for the commit c16d87c to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4178      +/-   ##
===========================================
+ Coverage    82.43%   82.59%   +0.16%     
===========================================
  Files          402      401       -1     
  Lines       291084   288404    -2680     
===========================================
- Hits        239951   238205    -1746     
+ Misses       51133    50199     -934

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wileyj · 2023-12-20T04:59:22Z

moved to draft so @ASuciuX et al can work through some issues discovered in initial tests

ASuciuX · 2023-12-27T17:01:02Z

The Tracking PR Mutants / Incremental Mutants Testing (pull_request) is failing on this because the reference is not valid ( the actions PR has to be merged first)

Cases: - if >= 16 mutants on big packages => run big packages using 8 shards - else run big packages without shards - if >= 16 mutants on big packages and >= 120 mutants on small packages => run small packages using 4 shards - else if < 16 mutants on big packages and >= 80 mutants on small packages => run small packages using 4 shards - else run small packages without shards

ASuciuX · 2024-01-07T22:29:57Z

Overall Flow Diagram #4210 (comment)

.github/workflows/pr-differences-mutants.yml

wileyj · 2024-01-08T21:19:20Z

I'm also considering that you may want to run this workflow only after an approval - otherwise, it will currently trigger on any commit to the branch being PR'ed.
I'm ambivalent if that's the right way to go here - but most tests currently do not run until a PR is approved (only a small sample of workflows are run on commits, with goal being faster iteration. if mutant tests are run on every commit, it may slow down that objective).

ASuciuX · 2024-01-08T23:04:58Z

I'm also considering that you may want to run this workflow only after an approval - otherwise, it will currently trigger on any commit to the branch being PR'ed. I'm ambivalent if that's the right way to go here - but most tests currently do not run until a PR is approved (only a small sample of workflows are run on commits, with goal being faster iteration. if mutant tests are run on every commit, it may slow down that objective).

The goal here was to run the mutants before the PR's approval, since the workflow is a reference to which of the added/modified functions fail when running mutants, so the developers working on the PR can check and fix them and the others can approve the PR based on the fact that there are no mutants unchecked.

ASuciuX · 2024-01-09T17:28:27Z

I'm also considering that you may want to run this workflow only after an approval - otherwise, it will currently trigger on any commit to the branch being PR'ed. I'm ambivalent if that's the right way to go here - but most tests currently do not run until a PR is approved (only a small sample of workflows are run on commits, with goal being faster iteration. if mutant tests are run on every commit, it may slow down that objective).

The goal here was to run the mutants before the PR's approval, since the workflow is a reference to which of the added/modified functions fail when running mutants, so the developers working on the PR can check and fix them and the others can approve the PR based on the fact that there are no mutants unchecked.

i see. i guess this is probably fine, as long as the diff step works as intended so it's not re-running this workflow unnecessarily.

I'm not certain about what you mean by that, currently it checks the differences between the base branch and the latest commit from the source branch and runs mutants for all the differences It runs it every time, as if a new commit would be pushed to the source branch before the latest run finishes, it would stop it and no output would be available for what was till that point. If the workflow is finalised and there are missed mutants, it would flag them for the current commit, but if they are fixed in a future commit it marks them as caught then. Also, in this manner, there is only one standard final output that can be checked by anyone to see which mutants were created for that specific PR.

the other comment i have here is that the workflow is quite verbose - it would be ideal to move most of the workflow into a composite action.

I've moved pretty much everything that could be moved to composite, what is left is necessary for the way it currently runs. Maybe the output step can be moved in a composite action, in that case, would printing to "$GITHUB_STEP_SUMMARY" still work the way it currently does?

yes, GITHUB_STEP_SUMMARY would still work - what i'm interested in is if the step logic you have here can all be moved to a composite action, and the workflow here would simply run a git checkout and then run the composite action. keeping it as-is isn't incorrect, but creates a larger barrier to resolve issues or make updates.

Technically only a job out of the 4 calling the composite action could be moved - one without strategy matrix - but it would require additional checks in the composite. The other 2 jobs have different matrix dimensions, so there's really no way to get rid of them, since at least 1 job would run without matrix, one would run with 4 and the other with 8 shards. The job with the verification that's done with bash scripts before them is made in order to know which one of the jobs to run, and the only thing that is left to be moved to composite is the output job, which would decrease the lines count by ~35%, but we would still have over 200. If that's the case, do you think I should still move the output job to a composite action, or leave it here?

wileyj · 2024-01-09T18:50:45Z

@ASuciuX it's entirely doable -
the conditionals can remain (mostly, maybe small changes would be needed) - you'd simply have to have an output in the composite action to report a status.

there is a LOT of shell in this workflow. moving that to a different repo would make fixing/updating it much easier.

looking it through, i don't think it's an easy task (mainly due to the checks/outputs you have) but i think it's worthwhile nonetheless for the reason i stated above. it would require some care in the composite action (mainly around inputs/outputs), but again - i feel it's worth the effort and will make maintaining this workflow much easier.

- add required input fields to the output job

wileyj · 2024-01-17T00:03:40Z

this is interesting @ASuciuX : https://github.com/stacks-network/stacks-core/actions/runs/7547829177/job/20548598881?pr=4178#step:2:1323

even more peculiar is that it tried to download a musl package to satisfy:

info: downloading https://github.com/cargo-bins/cargo-binstall/releases/download/v1.4.4/cargo-binstall-x86_64-unknown-linux-musl.tgz
info: verifying sha256 checksum for cargo-binstall-x86_64-unknown-linux-musl.tgz
info: cargo-binstall installed at /home/runner/.cargo/bin/cargo-binstall

ASuciuX · 2024-01-17T15:08:18Z

this is interesting @ASuciuX : https://github.com/stacks-network/stacks-core/actions/runs/7547829177/job/20548598881?pr=4178#step:2:1323

even more peculiar is that it tried to download a musl package to satisfy:
info: downloading https://github.com/cargo-bins/cargo-binstall/releases/download/v1.4.4/cargo-binstall-x86_64-unknown-linux-musl.tgz
info: verifying sha256 checksum for cargo-binstall-x86_64-unknown-linux-musl.tgz
info: cargo-binstall installed at /home/runner/.cargo/bin/cargo-binstall

I've reviewed past worflow executions and noticed that this issue has been consistently present in all of them. I believe it will work alright, but i could change it to https://github.com/cargo-bins/cargo-binstall?tab=readme-ov-file#in-github-actions. The interesting part is that binstall and also the mutants docs recommend using the one I've added.

wileyj · 2024-01-17T15:18:08Z

this is interesting @ASuciuX : https://github.com/stacks-network/stacks-core/actions/runs/7547829177/job/20548598881?pr=4178#step:2:1323
even more peculiar is that it tried to download a musl package to satisfy:
info: downloading https://github.com/cargo-bins/cargo-binstall/releases/download/v1.4.4/cargo-binstall-x86_64-unknown-linux-musl.tgz
info: verifying sha256 checksum for cargo-binstall-x86_64-unknown-linux-musl.tgz
info: cargo-binstall installed at /home/runner/.cargo/bin/cargo-binstall
I've reviewed past worflow executions and noticed that this issue has been consistently present in all of them. I believe it will work alright, but i could change it to https://github.com/cargo-bins/cargo-binstall?tab=readme-ov-file#in-github-actions. The interesting part is that binstall and also the mutants docs recommend using the one I've added.

my concern is mostly that the workflow failed and nothing else ran as a result:

thread 'main' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/patch-0.7.0/src/parser.rs:84:5:
bug: failed to parse entire input. Remaining: 'diff --git a/stacks-common/src/libcommon.rs b/stacks-common/src/lib.rs

do you know why that error was triggered?

ASuciuX · 2024-01-18T05:28:00Z

@wileyj thanks for pointing it. just fixed it stacks-network/actions#14

it was related to: renaming .rs files & having no mutants found

wileyj · 2024-01-18T17:14:16Z

@ASuciuX the last question i have here is timing. if i understand correctly, the idea is that this workflow will run on every PR.
The current action has been running for over 2 hours: https://github.com/stacks-network/stacks-core/actions/runs/7547829177?pr=4178

can you share some timings that we can expect based on different types of PR's?
for example, when we merge to master for a release - the CI in total takes around 45-60 minutes. adding another 2+ hours to that time may not be desirable.

ASuciuX · 2024-01-19T06:40:18Z

I've identified and resolved an issue with the git diff command used in our workflow, which you can review here. The workflow run details are available here.

Previously, the command was erroneously capturing changes not only from the source branch but also from the base branch, starting from the initiation of the PR up to the present. This resulted in an extended runtime of about 3 hours for the mutants process, encompassing all differences on the develop branch from December 15 to January 18.

All the shards took less than 1 hour and 10 minutes, except 2 shards:

shard-big-0 took 2h 44m because 1h 50m was the timeout mutant
shard-big-2 took 3h 4m because 1h 48m was the timeout mutant

An enhancement that could significantly optimize our runtime involves dynamically terminating mutant processes based on their initial test durations. For instance, if the initial test for stacks-node takes X minutes, we could set a threshold to automatically stop the process after 1.5X minutes and call that a timeout mutant. Similarly, for stackslib, if the initial test takes Y minutes, the process would halt after 1.5Y minutes. Unfortunately, such a dynamic stopping mechanism isn't currently implemented in cargo mutants.

For context, in previous runs:

Small packages typically completed in under 30 minutes, aided by the use of shards.
Large packages like stackslib and stacks-node initially required about 20-25 minutes for build and test processes.
- Each "missed" and "caught" mutant took approximately 15 minutes. Using shards, this meant about 50-55 minutes for processing around 32 mutants (10-16 functions modified). Every additional 8 mutants added another 15 minutes to the runtime.
- "Unviable" mutants, which are functions lacking a Default implementation for their returned struct type, took less than a minute each.
- "Timeout" mutants typically required more time. However, these should be marked to be skipped (by adding a skip flag to their header) since they indicate functions unable to proceed in their test workflow with mutated values, as opposed to the original implementations.

With the fix applied, the workflow should now correctly process only the relevant changes, leading to more efficient runtimes.

ASuciuX · 2024-01-19T07:00:40Z

On top of the above specified run times, the integration of nextest in cargo-mutants is expected to significantly reduce the time needed for processing caught mutants. For more information, see this discussion.

wileyj · 2024-01-19T18:44:43Z

For context, in previous runs:

Small packages typically completed in under 30 minutes, aided by the use of shards.

Large packages like stackslib and stacks-node initially required about 20-25 minutes for build and test processes.

Each "missed" and "caught" mutant took approximately 15 minutes. Using shards, this meant about 50-55 minutes for processing around 32 mutants (10-16 functions modified). Every additional 8 mutants added another 15 minutes to the runtime.

"Unviable" mutants, which are functions lacking a Default implementation for their returned struct type, took less than a minute each.

"Timeout" mutants typically required more time. However, these should be marked to be skipped (by adding a skip flag to their header) since they indicate functions unable to proceed in their test workflow with mutated values, as opposed to the original implementations.

this is great!
can you add this comment :

- Small packages typically completed in under 30 minutes, aided by the use of shards.
- Large packages like stackslib and stacks-node initially required about 20-25 minutes for build and test processes.
    - Each "missed" and "caught" mutant took approximately 15 minutes. Using shards, this meant about 50-55 minutes for processing around 32 mutants (10-16 functions modified). Every additional 8 mutants added another 15 minutes to the runtime.
    - "Unviable" mutants, which are functions lacking a Default implementation for their returned struct type, took less than a minute each.
    - "Timeout" mutants typically required more time. However, these should be marked to be skipped (by adding a skip flag to their header) since they indicate functions unable to proceed in their test workflow with mutated values, as opposed to the original implementations.

to the file ./docs/ci-release.md? ideally under a section regarding the mutant workflow

wileyj

#4178 (comment)

i think one this documentation comment is addressed, everything else looks good

wileyj

shipit

docs/ci-release.md

.github/workflows/pr-differences-mutants.yml

ASuciuX and others added 13 commits November 25, 2023 00:38

feat: mutation testing initial integration

8370213

- dockerfile and shell script for specific packages - ci.yml for diff on packages on PR

Merge branch 'develop' into test/cargo-mutants-testing

06d68b2

fix: made functions discoverable to be mutants

1c5a75c

feat: added mutants output before fix clarity package

19f56ea

feat: added mutants output after fix clarity package

a3caebb

Update mutants-testing-general.sh

e41e89a

Co-authored-by: jbencin <[email protected]>

feat: renamed mod to lib.rs

b1649a3

Delete Dockerfile.mutation-testing as it is also run locally with cargo

32aa967

feat: modular mutations on shell

92b4e23

- runs for modified files & created files - has to be run before committing the changes ``` cd mutation-testing/scripts sh git-diff.sh ```

feat: restructure mutants to new CI workflows

5387f44

feat: update link to stacks action repo

0bf76dd

Merge branch 'develop' into test/cargo-mutants-testing

9616e07

feat: keep only filter pr workflow file

7c91912

ASuciuX self-assigned this Dec 15, 2023

ASuciuX mentioned this pull request Dec 15, 2023

Mutation Testing ( splitted into 2 PRs ) #4089

Closed

6 tasks

added specific triggers for the CI action on PR

491557d

saralab requested review from jcnelson, kantai and wileyj December 15, 2023 15:10

obycode marked this pull request as draft December 19, 2023 15:27

feat: rename from filter-pr to pr-differences

b90dc06

ASuciuX changed the title ~~Mutation testing - filter pr~~ Mutation testing - pr differences Jan 8, 2024

wileyj reviewed Jan 8, 2024

View reviewed changes

.github/workflows/pr-differences-mutants.yml Outdated Show resolved Hide resolved

feat: move shell runs from main workflow to composite action

298e8e0

ASuciuX mentioned this pull request Jan 10, 2024

Move Shells from Main Workflow to Composite Action stacks-network/actions#7

Merged

ASuciuX added 2 commits January 11, 2024 03:28

feat: add documentation for mutation testing

12388b7

feat: mutation testing - update composite branch

5a4f9b3

- add required input fields to the output job

saralab marked this pull request as ready for review January 16, 2024 21:37

feat: renamed back the lib files as cargo-mutants supports them now

16b7bff

Merge branch 'develop' into test/mutants-filter-pr

9b54fcb

saralab requested a review from obycode January 22, 2024 15:41

wileyj requested changes Jan 22, 2024

View reviewed changes

feat: mutants docs - time related outcomes

0875d10

wileyj approved these changes Jan 23, 2024

View reviewed changes