-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: parametrize bench mark tests #4974
base: main
Are you sure you want to change the base?
tests: parametrize bench mark tests #4974
Conversation
In the previous implementation, it was necessary to adjust the timeout value every time a benchmark test added. By parametrizing the benchmark tests, the time required for each test becomes predictable, eliminating the need to adjust the timeout value Signed-off-by: Tomoya Iwata <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4974 +/- ##
==========================================
+ Coverage 83.10% 83.93% +0.82%
==========================================
Files 245 248 +3
Lines 26723 27791 +1068
==========================================
+ Hits 22209 23327 +1118
+ Misses 4514 4464 -50
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
) | ||
|
||
executables = [] | ||
for line in stdout.split("\n"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could be stdout.splitlines()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix in b95900e
And I realized there are mamy same implementation.
https://github.com/search?q=repo%3Afirecracker-microvm%2Ffirecracker%20split(%22%5Cn%22)&type=code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think like this we're no longer doing an A/B-test, we're just benchmarking the same binary compiled from the PR branch twice (e.g. comparing the PR results to themselves)
@pytest.mark.no_block_pr | ||
@pytest.mark.timeout(900) | ||
def test_no_regression_relative_to_target_branch(): | ||
@pytest.mark.timeout(600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the buildkite run, it seems like the longest duration of one of these is 150s for the queue benchmarks, so I think we can actually drop this timeout marker altogether and just rely on the default timeout specified in pytest.ini (which is 300s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix in 215af23
_, stdout, _ = cargo( | ||
"bench", | ||
f"--all --quiet --target {platform.machine()}-unknown-linux-musl --message-format json --no-run", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhh, I don't think this does what we want. We precompile the executables ones (from the PR branch), and then we use this precompiled executable for both A and B runs. What need to do though is compile each benchmark twice, ones from the main branch and once from the PR branch, so that this test does a meaningful comparison :/ That's why in #4832 I suggested to use --list-only
or something: determine the names of the benchmarks here, and then compile them twice in _run_criterion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I misunderstood a bit how it works.
Let me confirm the modifications.
First, run cargo bench --all -- --list
to generate parameters to pass to pytest.parametrize
.
I will get the following output:
root@90de30508db0:/firecracker# cargo bench --all -- --list Finished `bench` profile [optimized] target(s) in 0.10s Running benches/block_request.rs (build/cargo_target/release/deps/block_request-2e4b90407b22a8d0) request_parse: benchmark Running benches/cpu_templates.rs (build/cargo_target/release/deps/cpu_templates-cd18fd51dbad16f4) Deserialization test - Template size (JSON string): [2380] bytes. Serialization test - Template size: [72] bytes. deserialize_cpu_template: benchmark serialize_cpu_template: benchmark Running benches/memory_access.rs (build/cargo_target/release/deps/memory_access-741f97a7c9c33391)
page_fault: benchmark
page_fault #2: benchmark
Running benches/queue.rs (build/cargo_target/release/deps/queue-b2dfffbab00c4157)
next_descriptor_16: benchmark
queue_pop_16: benchmark
queue_add_used_16: benchmark
queue_add_used_256: benchmark
And then, I will get benchmark name...for example queue_pop_16
.
Finally run a command like cargo bench --all -- queue_pop_16
in _run_criterion
.
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's pretty much it! The main point is that the compilation of the benchmarks needs to happen in _run_criterion, because we actually have to compile them twice, once for the pull request target, and once for the pull request head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use `splitlines()` instead of `split("\n")`. Signed-off-by: Tomoya Iwata <[email protected]>
No longer need to set individual timeout values, Because parameterized performance tests. Signed-off-by: Tomoya Iwata <[email protected]>
In the previous implementation, same binary that built in the PR branch execute twice, which was not a correct A/B test. This has been fixed. Signed-off-by: Tomoya Iwata <[email protected]>
In the previous implementation, git clone executed for each parameter of the parametize test. This has a large overhead, adjusted it so that fixtures only called once per class. Signed-off-by: Tomoya Iwata <[email protected]>
In the previous implementation, it was necessary to adjust the timeout value every time a benchmark test added.
By parametrizing the benchmark tests, the time required for each test becomes predictable, eliminating the need to adjust the timeout value
Changes
Parametrize the test by the list of criterion benchmarks.
By parametrizing the tests, git clone will executed for each parameter in here.
firecracker/tests/framework/ab_test.py
Lines 85 to 98 in 70ac154
To run all parametrized tests with single git close would require major revisions to
git_ab_test
, so this PR does not address that issue.Reason
close #4832
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md
.Runbook for Firecracker API changes.
integration tests.
TODO
.rust-vmm
.