Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(tests): add network perf tests for Retina #772

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

ritwikranjan
Copy link
Contributor

@ritwikranjan ritwikranjan commented Sep 23, 2024

Description

This pull request introduces several updates related to performance testing, dependency upgrades, and workflow enhancements. The most important changes include the addition of a new performance measurement workflow, updates to dependencies in go.mod, and modifications to the e2e test setup and execution.

Performance Testing Enhancements:

  • Added a new GitHub Actions workflow for network performance measurement that runs every two hours (.github/workflows/perf.yaml).
  • Introduced a new performance test script and related functions for gathering and publishing network performance metrics (test/e2e/retina_perf_test.go, test/e2e/scenarios/perf/get-network-performance-measures.go). [1] [2]

Workflow and Configuration Changes:

  • Updated the e2e test command to include a more specific file pattern (.github/workflows/e2e.yaml).
  • Added azure-cli feature to the devcontainer configuration (.devcontainer/devcontainer.json).

Documentation:

  • Added documentation for reading Retina performance test results and the metrics published to Azure App Insights (test/e2e/README.md).

These changes collectively enhance the testing infrastructure, improve dependency management, and provide better documentation for performance testing.

Related Issue

If this pull request is related to any issue, please mention it here. Additionally, make sure that the issue is assigned to you before submitting this pull request.

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes made.

Additional Notes

Add any additional notes or context about the pull request here.


Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

- Added new performance tests for iperf throughput (TCP and UDP)
- Metrics include CPU Utilization Host, CPU Utilization Remote, Max RTT, Mean RTT, Min RTT, Retransmits, and Total Throughput

This commit introduces new performance tests to measure iperf throughput under various conditions for the Retina project.

Signed-off-by: Ritwik Ranjan <[email protected]>
@ritwikranjan ritwikranjan requested a review from a team as a code owner September 23, 2024 13:28
@ritwikranjan ritwikranjan changed the title [WIP] chore/ Network perf test for Retina [WIP] chore/tests: add network perf tests for Retina Sep 23, 2024
test/e2e/retina_perf_test.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
test/e2e/retina_perf_test.go Outdated Show resolved Hide resolved
@ritwikranjan ritwikranjan changed the title [WIP] chore/tests: add network perf tests for Retina chore/tests: add network perf tests for Retina Sep 27, 2024
@ritwikranjan ritwikranjan changed the title chore/tests: add network perf tests for Retina chore(tests): add network perf tests for Retina Sep 27, 2024
Copy link
Member

@SRodi SRodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run a test on uksouth and getting this

                                --------------------------------------------------------------------------------
                                RESPONSE 400: 400 Bad Request
                                ERROR CODE: ErrCode_InsufficientVCPUQuota
                                --------------------------------------------------------------------------------
                                {
                                  "code": "ErrCode_InsufficientVCPUQuota",
                                  "details": null,
                                  "message": "Insufficient regional vcpu quota left for location uksouth. left regional vcpu quota 20, requested quota 36",
                                  "subcode": ""
                                }
                                --------------------------------------------------------------------------------
                Test:           TestPerfRetina

I also run the test in westus2, and that was not an issue, but I got the following:

2024/09/27 17:48:52 DaemonSet is not ready: kube-system/retina-agent. 0 out of 3 expected pods are ready
2024/09/27 17:48:54 DaemonSet is not ready: kube-system/retina-agent. 0 out of 3 expected pods are ready
2024/09/27 17:48:56 DaemonSet is not ready: kube-system/retina-agent. 0 out of 3 expected pods are ready
2024/09/27 17:48:58 DaemonSet is not ready: kube-system/retina-agent. 0 out of 3 expected pods are ready
2024/09/27 17:49:00 DaemonSet is not ready: kube-system/retina-agent. 0 out of 3 expected pods are ready
2024/09/27 17:49:02 Error received when checking status of resource retina-svc. Error: 'client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline', Resource details: 'Resource: "/v1, Resource=services", GroupVersionKind: "/v1, Kind=Service"
Name: "retina-svc", Namespace: "kube-system"'
2024/09/27 17:49:02 Retryable error? true
2024/09/27 17:49:02 Retrying as current number of retries 0 less than max number of retries 30
    runner.go:27: 
                Error Trace:    /home/srodi/src/retina/test/e2e/framework/types/runner.go:27
                                                        /home/srodi/src/retina/test/e2e/retina_perf_test.go:65
                Error:          Received unexpected error:
                                did not expect error from step InstallHelmChart but got error: failed to install chart: context deadline exceeded
                Test:           TestPerfRetina
DeleteResourceGroup setting stored value for parameter [SubscriptionID] set as [......-.....-....-....-.........]
DeleteResourceGroup setting stored value for parameter [ResourceGroupName] set as [srodi-e2e-netobs-1727452628]
DeleteResourceGroup setting stored value for parameter [Location] set as [westus2]
#################### DeleteResourceGroup ################################################################
2024/09/27 17:49:02 deleting resource group "srodi-e2e-netobs-1727452628"...
2024/09/27 17:49:05 resource group "srodi-e2e-netobs-1727452628" deleted successfully
--- FAIL: TestPerfRetina (3269.87s)

FYI @ritwikranjan

@ritwikranjan ritwikranjan self-assigned this Oct 1, 2024
@ritwikranjan ritwikranjan added the type/enhancement New feature or request label Oct 1, 2024
go.mod Outdated Show resolved Hide resolved
test/e2e/retina_perf_test.go Show resolved Hide resolved
test/e2e/retina_perf_test.go Outdated Show resolved Hide resolved
test/e2e/retina_perf_test.go Outdated Show resolved Hide resolved
Signed-off-by: Ritwik Ranjan <[email protected]>
@SRodi
Copy link
Member

SRodi commented Oct 2, 2024

@ritwikranjan I just got another fail on insufficient quota, this time for centralus. I would suggest to make sure the test can run in any regions part of locations slice. ([]string{"eastus2", "centralus", "southcentralus", "uksouth", "centralindia", "westus2"})

    runner.go:27: 
                Error Trace:    /home/srodi/src/retina/test/e2e/framework/types/runner.go:27
                                                        /home/srodi/src/retina/test/e2e/retina_perf_test.go:52
                Error:          Received unexpected error:
                                did not expect error from step CreateNPMCluster but got error: failed to finish the create cluster request: PUT https://management.azure.com/subscriptions/....-....-....-....-.........../resourceGroups/srodi-e2e-netobs-1727879517/providers/Microsoft.ContainerService/managedClusters/srodi-e2e-netobs-1727879517
                                --------------------------------------------------------------------------------
                                RESPONSE 400: 400 Bad Request
                                ERROR CODE: ErrCode_InsufficientVCPUQuota
                                --------------------------------------------------------------------------------
                                {
                                  "code": "ErrCode_InsufficientVCPUQuota",
                                  "details": null,
                                  "message": "Insufficient vcpu quota requested 32, remaining 0 for family standardDSv2Family for region centralus.",
                                  "subcode": ""
                                }
                                --------------------------------------------------------------------------------
                Test:           TestE2EPerfRetina
--- FAIL: TestE2EPerfRetina (26.22s)
FAIL
FAIL    command-line-arguments  26.239s
FAIL

Copy link
Member

@SRodi SRodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ritwikranjan I am getting the following error while running the test based on the most recent commit

    runner.go:27: 
                Error Trace:    /home/srodi/src/retina/test/e2e/framework/types/runner.go:27
                                                        /home/srodi/src/retina/test/e2e/retina_perf_test.go:63
                Error:          Received unexpected error:
                                did not expect error from step GetNetworkPerformanceMeasures but got error: failed to get network performance measures: failed to execute tests: error getting CSV data from orchestrator pod: error reading logs from pod netperf-orch-59dsc: the server rejected our request for an unknown reason (get pods netperf-orch-59dsc)
                Test:           TestE2EPerfRetina

@ritwikranjan
Copy link
Contributor Author

Will help with identifying issue #655

.github/workflows/perf.yaml Outdated Show resolved Hide resolved
test/e2e/retina_perf_test.go Outdated Show resolved Hide resolved
test/e2e/scenarios/perf/get-perf-regression-results.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants