test(performance): make tests more deterministic by relying more on system counts #5786

Hweinstock · 2024-10-15T18:25:08Z

Problem

Performance tests are currently flaky, making them a poor signal for performance regressions within the code base. Additionally, false alarms slow down development of unrelated features as test failures keep CI from being "green".

Therefore, rather than relying only on system usage thresholds for performance regressions, we can count the number of high-risk / potentially slow operations made by the code as a deterministic measure of its performance. However, directly counting each individual potentially slow operation used within the performance tests highly couples the test to the implementation details, making the tests less effective if the implementation changes.

Therefore, the goal of this PR is the following:

decrease performance test flakiness by increasing thresholds.
increase performance test effectiveness by relying on deterministic measures.
avoid coupling the tests to the implementation details.

Solution

To meet goal (1), we increase the thresholds of the tests to decrease the changes of a false alarm.

To meet goal (2), we count expensive operations. But, to avoid tying it to the implementation details, we count the expensive operations using somewhat-loose upper bounds. Thus, implementation changes modifying the exact number of expensive operations by a small constant do not set a false alarm. However, if they increase the number of expensive operations by a multiplicative factor, the upper bound will alert us.

As an example, we don't want the test to fail if it makes 5-10 more file system calls when working with a few hundred files, but we do want the test to fail if it makes 2x the number of files system calls. Therefore, in the code the bounds are often described as "operations per file", since it is the multiplicative increase we are concerned about. This allows us to achieve goal (3).

Implementation Details

The most common "expensive operation" we count is file system calls (through our fs module). Some other examples include the use of zip libraries or loading large files into memory.
We separate the upper bounds for the file system into read and write bounds. This granularity allows us to assert that specific code paths do not modify any files.

Notes

AdmZip removed in refactor: replace archiver with @zip.js/zip.js #4769 , so we don't address spying on it here. Once zip.js is implemented, we can spy on it in a follow up.

License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

justinmk3 · 2024-10-15T22:02:44Z

The most common "expensive operation" we count is file system calls (through our fs module).

We also have a fetch (http/network calls) abstraction which could be useful in the future.

justinmk3 · 2024-10-15T22:03:30Z

packages/core/src/amazonqFeatureDev/util/files.ts

@@ -28,17 +29,17 @@ export async function prepareRepoData(
    repoRootPaths: string[],
    workspaceFolders: CurrentWsFolders,
    telemetry: TelemetryHelper,
-    span: Span<AmazonqCreateUpload>
+    span: Span<AmazonqCreateUpload>,
+    zip: AdmZip = new AdmZip()


FYI: AdmZip will be replaced by zip.js #4769

ah, I see. Do you recommend I remove the AdmZip call counting in the meantime or leave it until that other PR is done?

Went ahead and deleted the AdmZip work. Once a new zip is implemented, this can be augmented to spy on it as needed.

#4769 is merged now

Hweinstock · 2024-10-16T17:07:00Z

/runIntegrationTests

jpinkney-aws · 2024-10-22T15:14:48Z

packages/core/src/testInteg/perf/collectFiles.test.ts

+    return performanceTest(
+        {
+            darwin: {
+                userCpuUsage: 100,


it looks like a lot of these values are just declared once and then used in darwin/linux/win32. Does it make sense to just have a field called default or something that can be passed in and used as the default for everything?

Yeah, now that the thresholds are looser I think it makes sense to factor this out to reduce clutter. I wrote a helper function to reduce this.

jpinkney-aws · 2024-10-22T15:20:43Z

packages/core/src/testInteg/perf/prepareRepoData.test.ts

    assert.ok(result)
    assert.strictEqual(Buffer.isBuffer(result.zipFileBuffer), true)
    assert.strictEqual(telemetry.repositorySize, expectedSize)
    assert.strictEqual(result.zipFileChecksum.length, 44)
+
+    assert.ok(getFsCallsUpperBound(setup.fsSpy) <= setup.numFiles * 4, 'total system calls should be under 4 per file')


I see a lot of different numbers that we are comparing the upper bound to. Out of curiosity how did you determine the number to use?

If t is the upper bound method (getFsCallsUpperBound), and n is the number of files its working with, I noticed that t(n) is always linear (at least with these tests). So we have something like t(n) = an + c for each test for some values a and c.

To decide on the upper bound for the test, I found the true values of a and c for each test, then basically just took (a+1)n as the upper bound. Usually c is pretty small (<5), so we would really have to increase the number of fs calls on a per file basis to break this bound.

Hweinstock · 2024-10-22T17:54:22Z

/runIntegrationTests

jpinkney-aws

Looking forward to seeing ci green again 😎

jpinkney-aws · 2024-10-23T14:22:20Z

packages/core/src/testInteg/perf/collectFiles.test.ts

+    afterEach(function () {
+        sinon.restore()
+    })
+    performanceTestWrapper(10)


I like this approach of simplifying it this way!

justinmk3 · 2024-10-23T19:31:15Z

packages/core/src/testInteg/perf/utilities.ts

+ */
+export function getFsReadsUpperBound(fsSpy: sinon.SinonSpiedInstance<FileSystem>): number {
+    return (
+        fsSpy.readFileBytes.callCount +


Just a thought: do we actually need spies or could our fs.ts module just increment a count? Would that save code, and posssibly be more reliable (no test setup required)?

Hweinstock added 30 commits October 7, 2024 12:56

move performance test for prepareRepo to integ

3f765c5

move security scan test

5a0357a

split up perf and non perf tests

e07593f

remove shared code to utils

3c1f909

move out more shared code

f5d6c60

Merge branch 'master' into pTests/moveToInteg

2897aee

move some files around

caf2be8

update imports

658ed5a

fix test changes

366b038

delete duplicate test file

1786ff4

fix tests again

3b987b7

fix tests again

56819a5

resolve conflicts

a13a6c7

initial work

da6c8ce

delete unneeded code

b3619a9

Merge branch 'master' into pTest/systemSpy

f9ed46d

Merge branch 'master' into pTests/moveToInteg

dbefbf0

move tests into testPerf

685924c

rename folder

0aa5546

implement spy

b0b63ad

increase thresholds

4fe7c7b

Merge branch 'pTests/moveToInteg' into pTest/systemSpy

f5947c1

build shared utility

fda96df

refactor collectFiles

10130a6

add spies for startSecScan

55ee15e

add system spies for zipCode

10236d8

merge in master

a6eda5b

Merge branch 'master' into pTests/systemSpy

b8ed1f4

add filesystem spy

efe25af

Merge branch 'master' into pTests/systemSpy

74dc701

Hweinstock added 2 commits October 15, 2024 13:18

add spy to file hash test

3a11a85

add spy for vfs

1a2017b

Hweinstock changed the title ~~tests(performance): make tests more deterministic by relying more on system counts~~ test(performance): make tests more deterministic by relying more on system counts Oct 15, 2024

Hweinstock added 2 commits October 15, 2024 17:41

adjust thresholds

5e2d976

remove temp debump to 1 testrun

157c628

justinmk3 reviewed Oct 15, 2024

View reviewed changes

adjust thresholds

6ecc708

Hweinstock marked this pull request as ready for review October 16, 2024 17:34

Hweinstock requested review from a team as code owners October 16, 2024 17:34

Hweinstock added 3 commits October 21, 2024 13:56

Merge branch 'master' into pTests/systemSpy

fdc2010

remove AdmZip work

870a644

remove unused variable

71f7eef

jpinkney-aws reviewed Oct 22, 2024

View reviewed changes

use common function to reduce clutter

4a253da

jpinkney-aws approved these changes Oct 23, 2024

View reviewed changes

Hweinstock merged commit 16d19c1 into aws:master Oct 23, 2024
30 checks passed

Hweinstock deleted the pTests/systemSpy branch October 23, 2024 18:09

justinmk3 reviewed Oct 23, 2024

View reviewed changes

Hweinstock mentioned this pull request Nov 21, 2024

unreliable test: startSecurityScanPerformanceTest #5479

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(performance): make tests more deterministic by relying more on system counts #5786

test(performance): make tests more deterministic by relying more on system counts #5786

Hweinstock commented Oct 15, 2024 •

edited

Loading

justinmk3 commented Oct 15, 2024

justinmk3 Oct 15, 2024

Hweinstock Oct 15, 2024 •

edited

Loading

Hweinstock Oct 21, 2024

justinmk3 Oct 23, 2024

Hweinstock commented Oct 16, 2024

jpinkney-aws Oct 22, 2024

Hweinstock Oct 22, 2024

jpinkney-aws Oct 22, 2024

Hweinstock Oct 22, 2024

Hweinstock commented Oct 22, 2024

jpinkney-aws left a comment

jpinkney-aws Oct 23, 2024

justinmk3 Oct 23, 2024 •

edited

Loading

test(performance): make tests more deterministic by relying more on system counts #5786

test(performance): make tests more deterministic by relying more on system counts #5786

Conversation

Hweinstock commented Oct 15, 2024 • edited Loading

Problem

Solution

Implementation Details

Notes

justinmk3 commented Oct 15, 2024

Choose a reason for hiding this comment

Hweinstock Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hweinstock commented Oct 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hweinstock commented Oct 22, 2024

jpinkney-aws left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinmk3 Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Hweinstock commented Oct 15, 2024 •

edited

Loading

Hweinstock Oct 15, 2024 •

edited

Loading

justinmk3 Oct 23, 2024 •

edited

Loading