tests: parametrize bench mark tests #4974

cm-iwata · 2024-12-30T06:43:08Z

In the previous implementation, it was necessary to adjust the timeout value every time a benchmark test added.
By parametrizing the benchmark tests, the time required for each test becomes predictable, eliminating the need to adjust the timeout value

Changes

Parametrize the test by the list of criterion benchmarks.

By parametrizing the tests, git clone will executed for each parameter in here.

firecracker/tests/framework/ab_test.py

Lines 85 to 98 in 70ac154

    
           with TemporaryDirectory() as tmp_dir: 
        
               dir_a = git_clone(Path(tmp_dir) / a_revision, a_revision) 
        
               result_a = test_runner(dir_a, True) 
        
               if b_revision: 
        
                   dir_b = git_clone(Path(tmp_dir) / b_revision, b_revision) 
        
               else: 
        
                   # By default, pytest execution happens inside the `tests` subdirectory. Pass the repository root, as 
        
                   # documented. 
        
                   dir_b = Path.cwd().parent 
        
               result_b = test_runner(dir_b, False) 
        
               comparison = comparator(result_a, result_b) 
        
               return result_a, result_b, comparison

To run all parametrized tests with single git close would require major revisions to git_ab_test, so this PR does not address that issue.

Reason

close #4832

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

I have read and understand CONTRIBUTING.md.
I have run tools/devtool checkstyle to verify that the PR passes the
automated style checks.
I have described what is done in these changes, why they are needed, and
how they are solving the problem in a clear and encompassing way.
I have updated any relevant documentation (both in code and in the docs)
in the PR.
I have mentioned all user-facing changes in CHANGELOG.md.
If a specific issue led to this PR, this PR closes the issue.
When making API changes, I have followed the
Runbook for Firecracker API changes.
I have tested all new and changed functionalities in unit tests and/or
integration tests.
I have linked an issue to every new TODO.

This functionality cannot be added in rust-vmm.

In the previous implementation, it was necessary to adjust the timeout value every time a benchmark test added. By parametrizing the benchmark tests, the time required for each test becomes predictable, eliminating the need to adjust the timeout value Signed-off-by: Tomoya Iwata <[email protected]>

codecov · 2025-01-08T09:32:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.93%. Comparing base (a5ffb7a) to head (3222c42).

❗ Current head 3222c42 differs from pull request most recent head 215af23

Please upload reports for the commit 215af23 to get more accurate results.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4974      +/-   ##
==========================================
- Coverage   83.95%   83.93%   -0.02%     
==========================================
  Files         248      248              
  Lines       27839    27791      -48     
==========================================
- Hits        23371    23327      -44     
+ Misses       4468     4464       -4

Flag	Coverage Δ
5.10-c5n.metal	`84.51% <ø> (-0.01%)`	⬇️
5.10-m5n.metal	`84.49% <ø> (-0.02%)`	⬇️
5.10-m6a.metal	`83.78% <ø> (-0.01%)`	⬇️
5.10-m6g.metal	`80.61% <ø> (-0.03%)`	⬇️
5.10-m6i.metal	`84.50% <ø> (-0.01%)`	⬇️
5.10-m7g.metal	`80.61% <ø> (-0.03%)`	⬇️
6.1-c5n.metal	`84.51% <ø> (-0.01%)`	⬇️
6.1-m5n.metal	`84.50% <ø> (-0.01%)`	⬇️
6.1-m6a.metal	`83.78% <ø> (-0.02%)`	⬇️
6.1-m6g.metal	`80.61% <ø> (-0.03%)`	⬇️
6.1-m6i.metal	`84.49% <ø> (-0.02%)`	⬇️
6.1-m7g.metal	`80.61% <ø> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pb8o · 2024-12-30T10:36:47Z

tests/integration_tests/performance/test_benchmarks.py

+    )
+
+    executables = []
+    for line in stdout.split("\n"):


nit: could be stdout.splitlines()

fix in b95900e

And I realized there are mamy same implementation.

https://github.com/search?q=repo%3Afirecracker-microvm%2Ffirecracker%20split(%22%5Cn%22)&type=code

roypat

I think like this we're no longer doing an A/B-test, we're just benchmarking the same binary compiled from the PR branch twice (e.g. comparing the PR results to themselves)

roypat · 2025-01-08T13:55:32Z

tests/integration_tests/performance/test_benchmarks.py

 @pytest.mark.no_block_pr
-@pytest.mark.timeout(900)
-def test_no_regression_relative_to_target_branch():
+@pytest.mark.timeout(600)


from the buildkite run, it seems like the longest duration of one of these is 150s for the queue benchmarks, so I think we can actually drop this timeout marker altogether and just rely on the default timeout specified in pytest.ini (which is 300s)

fix in 215af23

roypat · 2025-01-08T14:00:01Z

tests/integration_tests/performance/test_benchmarks.py

+    _, stdout, _ = cargo(
+        "bench",
+        f"--all --quiet --target {platform.machine()}-unknown-linux-musl --message-format json --no-run",
+    )


Mhh, I don't think this does what we want. We precompile the executables ones (from the PR branch), and then we use this precompiled executable for both A and B runs. What need to do though is compile each benchmark twice, ones from the main branch and once from the PR branch, so that this test does a meaningful comparison :/ That's why in #4832 I suggested to use --list-only or something: determine the names of the benchmarks here, and then compile them twice in _run_criterion

Sorry, I think I misunderstood a bit how it works.

Let me confirm the modifications.
First, run cargo bench --all -- --list to generate parameters to pass to pytest.parametrize.
I will get the following output:

root@90de30508db0:/firecracker# cargo bench --all -- --list Finished `bench` profile [optimized] target(s) in 0.10s Running benches/block_request.rs (build/cargo_target/release/deps/block_request-2e4b90407b22a8d0) request_parse: benchmark Running benches/cpu_templates.rs (build/cargo_target/release/deps/cpu_templates-cd18fd51dbad16f4) Deserialization test - Template size (JSON string): [2380] bytes. Serialization test - Template size: [72] bytes. deserialize_cpu_template: benchmark serialize_cpu_template: benchmark Running benches/memory_access.rs (build/cargo_target/release/deps/memory_access-741f97a7c9c33391) page_fault: benchmark page_fault #2: benchmark Running benches/queue.rs (build/cargo_target/release/deps/queue-b2dfffbab00c4157) next_descriptor_16: benchmark queue_pop_16: benchmark queue_add_used_16: benchmark queue_add_used_256: benchmark

And then, I will get benchmark name...for example queue_pop_16.

Finally run a command like cargo bench --all -- queue_pop_16 in _run_criterion.
Is this correct?

use `splitlines()` instead of `split("\n")`. Signed-off-by: Tomoya Iwata <[email protected]>

No longer need to set individual timeout values, Because parameterized performance tests. Signed-off-by: Tomoya Iwata <[email protected]>

pb8o added the python Pull requests that update Python code label Jan 8, 2025

pb8o previously approved these changes Jan 8, 2025

View reviewed changes

roypat requested changes Jan 8, 2025

View reviewed changes

cm-iwata added 3 commits January 12, 2025 16:46

Merge branch 'main' into tests/parametrize_benchmark

f9ec357

tests: use splitlines

b95900e

use `splitlines()` instead of `split("\n")`. Signed-off-by: Tomoya Iwata <[email protected]>

tests: delete specific timeout parameter for performance test

215af23

No longer need to set individual timeout values, Because parameterized performance tests. Signed-off-by: Tomoya Iwata <[email protected]>

cm-iwata dismissed pb8o’s stale review via 215af23 January 12, 2025 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: parametrize bench mark tests #4974

tests: parametrize bench mark tests #4974

cm-iwata commented Dec 30, 2024 •

edited

Loading

codecov bot commented Jan 8, 2025 •

edited

Loading

pb8o Dec 30, 2024

cm-iwata Jan 12, 2025

roypat left a comment

roypat Jan 8, 2025

cm-iwata Jan 12, 2025

roypat Jan 8, 2025

cm-iwata Jan 12, 2025

	with TemporaryDirectory() as tmp_dir:
	dir_a = git_clone(Path(tmp_dir) / a_revision, a_revision)
	result_a = test_runner(dir_a, True)

	if b_revision:
	dir_b = git_clone(Path(tmp_dir) / b_revision, b_revision)
	else:
	# By default, pytest execution happens inside the `tests` subdirectory. Pass the repository root, as
	# documented.
	dir_b = Path.cwd().parent
	result_b = test_runner(dir_b, False)

	comparison = comparator(result_a, result_b)
	return result_a, result_b, comparison

tests: parametrize bench mark tests #4974

Are you sure you want to change the base?

tests: parametrize bench mark tests #4974

Conversation

cm-iwata commented Dec 30, 2024 • edited Loading

Changes

Reason

License Acceptance

PR Checklist

codecov bot commented Jan 8, 2025 • edited Loading

Codecov Report

pb8o Dec 30, 2024

Choose a reason for hiding this comment

cm-iwata Jan 12, 2025

Choose a reason for hiding this comment

roypat left a comment

Choose a reason for hiding this comment

roypat Jan 8, 2025

Choose a reason for hiding this comment

cm-iwata Jan 12, 2025

Choose a reason for hiding this comment

roypat Jan 8, 2025

Choose a reason for hiding this comment

cm-iwata Jan 12, 2025

Choose a reason for hiding this comment

cm-iwata commented Dec 30, 2024 •

edited

Loading

codecov bot commented Jan 8, 2025 •

edited

Loading