Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutants run on PR differences #4303

Merged
merged 3 commits into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions .github/workflows/pr-differences-mutants.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
name: PR Differences Mutants

on:
pull_request:
types:
- opened
- reopened
- synchronize
- ready_for_review
paths:
- "**.rs"

concurrency:
group: pr-differences-${{ github.head_ref || github.ref || github.run_id }}
# Always cancel duplicate jobs
cancel-in-progress: true

jobs:
# Check and output whether to run big (`stacks-node`/`stackslib`) or small (others) packages with or without shards
check-big-packages-and-shards:
name: Check Packages and Shards

runs-on: ubuntu-latest

outputs:
run_big_packages: ${{ steps.check_packages_and_shards.outputs.run_big_packages }}
big_packages_with_shards: ${{ steps.check_packages_and_shards.outputs.big_packages_with_shards }}
run_small_packages: ${{ steps.check_packages_and_shards.outputs.run_small_packages }}
small_packages_with_shards: ${{ steps.check_packages_and_shards.outputs.small_packages_with_shards }}

steps:
- id: check_packages_and_shards
uses: stacks-network/actions/stacks-core/mutation-testing/check-packages-and-shards@main

# Mutation testing - Execute on PR on small packages that have functions modified (normal run, no shards)
pr-differences-mutants-small-normal:
name: Mutation Testing - Normal, Small

needs: check-big-packages-and-shards

if: ${{ needs.check-big-packages-and-shards.outputs.run_small_packages == 'true' && needs.check-big-packages-and-shards.outputs.small_packages_with_shards == 'false' }}

runs-on: ubuntu-latest

steps:
- name: Run mutants on diffs
uses: stacks-network/actions/stacks-core/mutation-testing/pr-differences@main
with:
package-dimension: "small"

# Mutation testing - Execute on PR on small packages that have functions modified (run with strategy matrix shards)
pr-differences-mutants-small-shards:
name: Mutation Testing - Shards, Small

needs: check-big-packages-and-shards

if: ${{ needs.check-big-packages-and-shards.outputs.run_small_packages == 'true' && needs.check-big-packages-and-shards.outputs.small_packages_with_shards == 'true' }}

runs-on: ubuntu-latest

strategy:
fail-fast: false
matrix:
shard: [0, 1, 2, 3]

steps:
- name: Run mutants on diffs
uses: stacks-network/actions/stacks-core/mutation-testing/pr-differences@main
with:
shard: ${{ matrix.shard }}
package-dimension: "small"

# Mutation testing - Execute on PR on big packages that have functions modified (normal run, no shards)
pr-differences-mutants-big-normal:
name: Mutation Testing - Normal, Big

needs: check-big-packages-and-shards

if: ${{ needs.check-big-packages-and-shards.outputs.run_big_packages == 'true' && needs.check-big-packages-and-shards.outputs.big_packages_with_shards == 'false' }}

runs-on: ubuntu-latest

steps:
- name: Run Run mutants on diffs
env:
BITCOIND_TEST: 1
RUST_BACKTRACE: full
uses: stacks-network/actions/stacks-core/mutation-testing/pr-differences@main
with:
package-dimension: "big"

# Mutation testing - Execute on PR on big packages that have functions modified (run with strategy matrix shards)
pr-differences-mutants-big-shards:
name: Mutation Testing - Shards, Big

needs: check-big-packages-and-shards

if: ${{ needs.check-big-packages-and-shards.outputs.run_big_packages == 'true' && needs.check-big-packages-and-shards.outputs.big_packages_with_shards == 'true' }}

runs-on: ubuntu-latest

strategy:
fail-fast: false
matrix:
shard: [0, 1, 2, 3, 4, 5, 6, 7]

steps:
- name: Run mutants on diffs
env:
BITCOIND_TEST: 1
RUST_BACKTRACE: full
uses: stacks-network/actions/stacks-core/mutation-testing/pr-differences@main
with:
shard: ${{ matrix.shard }}
package-dimension: "big"

# Output the mutants and fail the workflow if there are missed/timeout/unviable mutants
output-mutants:
name: Output Mutants

runs-on: ubuntu-latest

needs:
[
check-big-packages-and-shards,
pr-differences-mutants-small-normal,
pr-differences-mutants-small-shards,
pr-differences-mutants-big-normal,
pr-differences-mutants-big-shards,
]

steps:
- name: Output Mutants
uses: stacks-network/actions/stacks-core/mutation-testing/output-pr-mutants@main
with:
big_packages: ${{ needs.check-big-packages-and-shards.outputs.run_big_packages }}
shards_for_big_packages: ${{ needs.check-big-packages-and-shards.outputs.big_packages_with_shards }}
small_packages: ${{ needs.check-big-packages-and-shards.outputs.run_small_packages }}
shards_for_small_packages: ${{ needs.check-big-packages-and-shards.outputs.small_packages_with_shards }}
95 changes: 95 additions & 0 deletions docs/ci-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,4 +228,99 @@ ex: Branch is named `develop` and the PR is numbered `113`
- `stacks-core:2.1.0.0.0`
- `stacks-core:latest`

## Mutation Testing

When a new Pull Request (PR) is submitted, this feature evaluates the quality of the tests added or modified in the PR.
It checks the new and altered functions through mutation testing.
Mutation testing involves making small changes (mutations) to the code to check if the tests can detect these changes.

The mutations are run with or without a [Github Actions matrix](https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs).
The matrix is used when there is a large number of mutations to run ([check doc specific cases](https://github.com/stacks-network/actions/blob/main/stacks-core/mutation-testing/check-packages-and-shards/README.md#outputs)).
We utilize a matrix strategy with shards to enable parallel execution in GitHub Actions.
This approach allows for the concurrent execution of multiple jobs across various runners.
The total workload is divided across all shards, effectively reducing the overall duration of a workflow because the time taken is approximately the total time divided by the number of shards (+ initial build & test time).
This is particularly advantageous for large packages that have significant build and test times, as it enhances efficiency and speeds up the process.

Since mutation testing is directly correlated to the written tests, there are slower packages (due to the quantity or time it takes to run the tests) like `stackslib` or `stacks-node`.
These mutations are run separately from the others, with one or more parallel jobs, depending on the amount of mutations found.

Once all the jobs have finished testing mutants, the last job collects all the tested mutations from the previous jobs, combines them and outputs them to the `Summary` section of the workflow, at the bottom of the page.
There, you can find all mutants on categories, with links to the function they tested, and a short description on how to fix the issue.
The PR should only be approved/merged after all the mutants tested are in the `Caught` category.

### Time required to run the workflow based on mutants outcome and packages' size

- Small packages typically completed in under 30 minutes, aided by the use of shards.
- Large packages like stackslib and stacks-node initially required about 20-25 minutes for build and test processes.
- Each "missed" and "caught" mutant took approximately 15 minutes. Using shards, this meant about 50-55 minutes for processing around 32 mutants (10-16 functions modified). Every additional 8 mutants added another 15 minutes to the runtime.
- "Unviable" mutants, which are functions lacking a Default implementation for their returned struct type, took less than a minute each.
- "Timeout" mutants typically required more time. However, these should be marked to be skipped (by adding a skip flag to their header) since they indicate functions unable to proceed in their test workflow with mutated values, as opposed to the original implementations.

File:

- [PR Differences Mutants](../.github/workflows/pr-differences-mutants.yml)

### Mutant Outcomes

- caught — A test failed with this mutant applied.
This is a good sign about test coverage.

- missed — No test failed with this mutation applied, which seems to indicate a gap in test coverage.
Or, it may be that the mutant is undistinguishable from the correct code.
In any case, you may wish to add a better test.

- unviable — The attempted mutation doesn't compile.
This is inconclusive about test coverage, since the function's return structure may not implement `Default::default()` (one of the mutations applied), hence causing the compile to fail.
It is recommended to add `Default` implementation for the return structures of these functions, only mark that the function should be skipped as a last resort.

- timeout — The mutation caused the test suite to run for a long time, until it was eventually killed.
You might want to investigate the cause and only mark the function to be skipped if necessary.

### Skipping Mutations

Some functions may be inherently hard to cover with tests, for example if:

- Generated mutants cause tests to hang.
- You've chosen to test the functionality by human inspection or some higher-level integration tests.
- The function has side effects or performance characteristics that are hard to test.
- You've decided that the function is not important to test.

To mark functions as skipped, so they are not mutated:

- Add a Cargo dependency of the [mutants](https://crates.io/crates/mutants) crate, version `0.0.3` or later (this must be a regular `dependency`, not a `dev-dependency`, because the annotation will be on non-test code) and mark functions with `#[mutants::skip]`, or

- You can avoid adding the dependency by using the slightly longer `#[cfg_attr(test, mutants::skip)]`.

### Example

```rust
use std::time::{Duration, Instant};

/// Returns true if the program should stop
#[cfg_attr(test, mutants::skip)] // Returning false would cause a hang
fn should_stop() -> bool {
true
}

pub fn controlled_loop() {
let start = Instant::now();
for i in 0.. {
println!("{}", i);
if should_stop() {
break;
}
if start.elapsed() > Duration::from_secs(60 * 5) {
panic!("timed out");
}
}
}

mod test {
#[test]
fn controlled_loop_terminates() {
super::controlled_loop()
}
}
```

---