Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutation testing - pr differences #4178

Merged
merged 26 commits into from
Jan 26, 2024

Conversation

ASuciuX
Copy link
Contributor

@ASuciuX ASuciuX commented Dec 15, 2023

Applicable issues

This has to be merged before in order to have the references right. Please review this as well

Additional info (benefits, drawbacks, caveats)

Solutions and Recommended Usage

Cargo-mutant does an incremental build for each mutation, that’s why this solution comes with the implementation of tracking the output and only updating it with the differences, to run 10-20, maybe a few 100s mutants instead of 20,000.

CI YML File: Are linked with the actions in the actions' repo stacks-network/actions#3.

Should be executed with every PR, as it tests only the updated or newly added functions in the PR's commits.

How to read the status from the pr-differences workflow

If it is not failing, it means all the functions are tested properly.
If it is failing it will display what type of tests are the problem: missing tests, timeout tests, or unviable tests. For each section, it will specify the functions, the files they are in and their line numbers.

Handling Exit Codes (docs)

Mutation testing produces exit codes post-completion.
In the pr-differences workflow they are caught and specified with extra details about what should be done based on them.

@ASuciuX ASuciuX self-assigned this Dec 15, 2023
Copy link

codecov bot commented Dec 15, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4b6638a) 82.43% compared to head (5a4f9b3) 82.59%.

❗ Current head 5a4f9b3 differs from pull request most recent head c16d87c. Consider uploading reports for the commit c16d87c to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4178      +/-   ##
===========================================
+ Coverage    82.43%   82.59%   +0.16%     
===========================================
  Files          402      401       -1     
  Lines       291084   288404    -2680     
===========================================
- Hits        239951   238205    -1746     
+ Misses       51133    50199     -934     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@obycode obycode marked this pull request as draft December 19, 2023 15:27
@wileyj
Copy link
Contributor

wileyj commented Dec 20, 2023

moved to draft so @ASuciuX et al can work through some issues discovered in initial tests

@ASuciuX
Copy link
Contributor Author

ASuciuX commented Dec 27, 2023

The Tracking PR Mutants / Incremental Mutants Testing (pull_request) is failing on this because the reference is not valid ( the actions PR has to be merged first)

Cases:
- if >= 16 mutants on big packages => run big packages using 8 shards
- else run big packages without shards

- if >= 16 mutants on big packages and >= 120 mutants on small packages => run small packages using 4 shards
- else if < 16 mutants on big packages and >= 80 mutants on small packages => run small packages using 4 shards
- else run small packages without shards
@ASuciuX
Copy link
Contributor Author

ASuciuX commented Jan 7, 2024

Overall Flow Diagram #4210 (comment)

@ASuciuX ASuciuX changed the title Mutation testing - filter pr Mutation testing - pr differences Jan 8, 2024
@wileyj
Copy link
Contributor

wileyj commented Jan 8, 2024

I'm also considering that you may want to run this workflow only after an approval - otherwise, it will currently trigger on any commit to the branch being PR'ed.
I'm ambivalent if that's the right way to go here - but most tests currently do not run until a PR is approved (only a small sample of workflows are run on commits, with goal being faster iteration. if mutant tests are run on every commit, it may slow down that objective).

@ASuciuX
Copy link
Contributor Author

ASuciuX commented Jan 8, 2024

I'm also considering that you may want to run this workflow only after an approval - otherwise, it will currently trigger on any commit to the branch being PR'ed. I'm ambivalent if that's the right way to go here - but most tests currently do not run until a PR is approved (only a small sample of workflows are run on commits, with goal being faster iteration. if mutant tests are run on every commit, it may slow down that objective).

The goal here was to run the mutants before the PR's approval, since the workflow is a reference to which of the added/modified functions fail when running mutants, so the developers working on the PR can check and fix them and the others can approve the PR based on the fact that there are no mutants unchecked.

@ASuciuX
Copy link
Contributor Author

ASuciuX commented Jan 9, 2024

I'm also considering that you may want to run this workflow only after an approval - otherwise, it will currently trigger on any commit to the branch being PR'ed. I'm ambivalent if that's the right way to go here - but most tests currently do not run until a PR is approved (only a small sample of workflows are run on commits, with goal being faster iteration. if mutant tests are run on every commit, it may slow down that objective).

The goal here was to run the mutants before the PR's approval, since the workflow is a reference to which of the added/modified functions fail when running mutants, so the developers working on the PR can check and fix them and the others can approve the PR based on the fact that there are no mutants unchecked.

i see. i guess this is probably fine, as long as the diff step works as intended so it's not re-running this workflow unnecessarily.

I'm not certain about what you mean by that, currently it checks the differences between the base branch and the latest commit from the source branch and runs mutants for all the differences It runs it every time, as if a new commit would be pushed to the source branch before the latest run finishes, it would stop it and no output would be available for what was till that point. If the workflow is finalised and there are missed mutants, it would flag them for the current commit, but if they are fixed in a future commit it marks them as caught then. Also, in this manner, there is only one standard final output that can be checked by anyone to see which mutants were created for that specific PR.

the other comment i have here is that the workflow is quite verbose - it would be ideal to move most of the workflow into a composite action.

I've moved pretty much everything that could be moved to composite, what is left is necessary for the way it currently runs. Maybe the output step can be moved in a composite action, in that case, would printing to "$GITHUB_STEP_SUMMARY" still work the way it currently does?

yes, GITHUB_STEP_SUMMARY would still work - what i'm interested in is if the step logic you have here can all be moved to a composite action, and the workflow here would simply run a git checkout and then run the composite action. keeping it as-is isn't incorrect, but creates a larger barrier to resolve issues or make updates.

Technically only a job out of the 4 calling the composite action could be moved - one without strategy matrix - but it would require additional checks in the composite. The other 2 jobs have different matrix dimensions, so there's really no way to get rid of them, since at least 1 job would run without matrix, one would run with 4 and the other with 8 shards. The job with the verification that's done with bash scripts before them is made in order to know which one of the jobs to run, and the only thing that is left to be moved to composite is the output job, which would decrease the lines count by ~35%, but we would still have over 200. If that's the case, do you think I should still move the output job to a composite action, or leave it here?

@wileyj
Copy link
Contributor

wileyj commented Jan 9, 2024

@ASuciuX it's entirely doable -
the conditionals can remain (mostly, maybe small changes would be needed) - you'd simply have to have an output in the composite action to report a status.

there is a LOT of shell in this workflow. moving that to a different repo would make fixing/updating it much easier.

looking it through, i don't think it's an easy task (mainly due to the checks/outputs you have) but i think it's worthwhile nonetheless for the reason i stated above. it would require some care in the composite action (mainly around inputs/outputs), but again - i feel it's worth the effort and will make maintaining this workflow much easier.

@saralab saralab marked this pull request as ready for review January 16, 2024 21:37
@wileyj
Copy link
Contributor

wileyj commented Jan 17, 2024

this is interesting @ASuciuX : https://github.com/stacks-network/stacks-core/actions/runs/7547829177/job/20548598881?pr=4178#step:2:1323

even more peculiar is that it tried to download a musl package to satisfy:

info: downloading https://github.com/cargo-bins/cargo-binstall/releases/download/v1.4.4/cargo-binstall-x86_64-unknown-linux-musl.tgz
info: verifying sha256 checksum for cargo-binstall-x86_64-unknown-linux-musl.tgz
info: cargo-binstall installed at /home/runner/.cargo/bin/cargo-binstall

@ASuciuX
Copy link
Contributor Author

ASuciuX commented Jan 17, 2024

this is interesting @ASuciuX : https://github.com/stacks-network/stacks-core/actions/runs/7547829177/job/20548598881?pr=4178#step:2:1323

even more peculiar is that it tried to download a musl package to satisfy:

info: downloading https://github.com/cargo-bins/cargo-binstall/releases/download/v1.4.4/cargo-binstall-x86_64-unknown-linux-musl.tgz
info: verifying sha256 checksum for cargo-binstall-x86_64-unknown-linux-musl.tgz
info: cargo-binstall installed at /home/runner/.cargo/bin/cargo-binstall

I've reviewed past worflow executions and noticed that this issue has been consistently present in all of them. I believe it will work alright, but i could change it to https://github.com/cargo-bins/cargo-binstall?tab=readme-ov-file#in-github-actions. The interesting part is that binstall and also the mutants docs recommend using the one I've added.

@wileyj
Copy link
Contributor

wileyj commented Jan 17, 2024

this is interesting @ASuciuX : https://github.com/stacks-network/stacks-core/actions/runs/7547829177/job/20548598881?pr=4178#step:2:1323
even more peculiar is that it tried to download a musl package to satisfy:

info: downloading https://github.com/cargo-bins/cargo-binstall/releases/download/v1.4.4/cargo-binstall-x86_64-unknown-linux-musl.tgz
info: verifying sha256 checksum for cargo-binstall-x86_64-unknown-linux-musl.tgz
info: cargo-binstall installed at /home/runner/.cargo/bin/cargo-binstall

I've reviewed past worflow executions and noticed that this issue has been consistently present in all of them. I believe it will work alright, but i could change it to https://github.com/cargo-bins/cargo-binstall?tab=readme-ov-file#in-github-actions. The interesting part is that binstall and also the mutants docs recommend using the one I've added.

my concern is mostly that the workflow failed and nothing else ran as a result:

thread 'main' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/patch-0.7.0/src/parser.rs:84:5:
bug: failed to parse entire input. Remaining: 'diff --git a/stacks-common/src/libcommon.rs b/stacks-common/src/lib.rs

do you know why that error was triggered?

@ASuciuX
Copy link
Contributor Author

ASuciuX commented Jan 18, 2024

@wileyj thanks for pointing it. just fixed it stacks-network/actions#14

it was related to: renaming .rs files & having no mutants found

@wileyj
Copy link
Contributor

wileyj commented Jan 18, 2024

@ASuciuX the last question i have here is timing. if i understand correctly, the idea is that this workflow will run on every PR.
The current action has been running for over 2 hours: https://github.com/stacks-network/stacks-core/actions/runs/7547829177?pr=4178

can you share some timings that we can expect based on different types of PR's?
for example, when we merge to master for a release - the CI in total takes around 45-60 minutes. adding another 2+ hours to that time may not be desirable.

@ASuciuX
Copy link
Contributor Author

ASuciuX commented Jan 19, 2024

I've identified and resolved an issue with the git diff command used in our workflow, which you can review here. The workflow run details are available here.

Previously, the command was erroneously capturing changes not only from the source branch but also from the base branch, starting from the initiation of the PR up to the present. This resulted in an extended runtime of about 3 hours for the mutants process, encompassing all differences on the develop branch from December 15 to January 18.
image
All the shards took less than 1 hour and 10 minutes, except 2 shards:

An enhancement that could significantly optimize our runtime involves dynamically terminating mutant processes based on their initial test durations. For instance, if the initial test for stacks-node takes X minutes, we could set a threshold to automatically stop the process after 1.5X minutes and call that a timeout mutant. Similarly, for stackslib, if the initial test takes Y minutes, the process would halt after 1.5Y minutes. Unfortunately, such a dynamic stopping mechanism isn't currently implemented in cargo mutants.


For context, in previous runs:

  • Small packages typically completed in under 30 minutes, aided by the use of shards.
  • Large packages like stackslib and stacks-node initially required about 20-25 minutes for build and test processes.
    • Each "missed" and "caught" mutant took approximately 15 minutes. Using shards, this meant about 50-55 minutes for processing around 32 mutants (10-16 functions modified). Every additional 8 mutants added another 15 minutes to the runtime.
    • "Unviable" mutants, which are functions lacking a Default implementation for their returned struct type, took less than a minute each.
    • "Timeout" mutants typically required more time. However, these should be marked to be skipped (by adding a skip flag to their header) since they indicate functions unable to proceed in their test workflow with mutated values, as opposed to the original implementations.

With the fix applied, the workflow should now correctly process only the relevant changes, leading to more efficient runtimes.

@ASuciuX
Copy link
Contributor Author

ASuciuX commented Jan 19, 2024

On top of the above specified run times, the integration of nextest in cargo-mutants is expected to significantly reduce the time needed for processing caught mutants. For more information, see this discussion.

@wileyj
Copy link
Contributor

wileyj commented Jan 19, 2024

For context, in previous runs:

  • Small packages typically completed in under 30 minutes, aided by the use of shards.

  • Large packages like stackslib and stacks-node initially required about 20-25 minutes for build and test processes.

    • Each "missed" and "caught" mutant took approximately 15 minutes. Using shards, this meant about 50-55 minutes for processing around 32 mutants (10-16 functions modified). Every additional 8 mutants added another 15 minutes to the runtime.
    • "Unviable" mutants, which are functions lacking a Default implementation for their returned struct type, took less than a minute each.
    • "Timeout" mutants typically required more time. However, these should be marked to be skipped (by adding a skip flag to their header) since they indicate functions unable to proceed in their test workflow with mutated values, as opposed to the original implementations.

this is great!
can you add this comment :

- Small packages typically completed in under 30 minutes, aided by the use of shards.
- Large packages like stackslib and stacks-node initially required about 20-25 minutes for build and test processes.
    - Each "missed" and "caught" mutant took approximately 15 minutes. Using shards, this meant about 50-55 minutes for processing around 32 mutants (10-16 functions modified). Every additional 8 mutants added another 15 minutes to the runtime.
    - "Unviable" mutants, which are functions lacking a Default implementation for their returned struct type, took less than a minute each.
    - "Timeout" mutants typically required more time. However, these should be marked to be skipped (by adding a skip flag to their header) since they indicate functions unable to proceed in their test workflow with mutated values, as opposed to the original implementations.

to the file ./docs/ci-release.md? ideally under a section regarding the mutant workflow

@saralab saralab requested a review from obycode January 22, 2024 15:41
Copy link
Contributor

@wileyj wileyj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4178 (comment)

i think one this documentation comment is addressed, everything else looks good

Copy link
Contributor

@wileyj wileyj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shipit

docs/ci-release.md Outdated Show resolved Hide resolved
docs/ci-release.md Outdated Show resolved Hide resolved
@ASuciuX ASuciuX merged commit bb6e887 into stacks-network:develop Jan 26, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants