Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: improve performance by eliminating redundancies #14028

Merged
merged 4 commits into from
Dec 30, 2024

Conversation

MasonM
Copy link
Contributor

@MasonM MasonM commented Dec 23, 2024

Motivation

There's three sources of redundancy that are slowing down CI builds:

  1. make install is being redundantly run three times for each E2E test:
    1. In the workflow:
      - name: Install manifests
      run: make install PROFILE=${{matrix.profile}} STATIC_FILES=false
    2. As a prerequisite of make start:
      start: install controller kit
    3. As a dependency by Kit:
      dependencies: install build-controller port-forward
  2. Similarly, the changes in ci: optimize tests by only building binaries once #14012 to centralize binary building aren't actually preventing the binaries from being rebuilt in the E2E tests because controller and cli are prerequisites of make start, and listed as prerequisites in Kit.
  3. Go dependencies are being re-downloaded in every job. The actions/setup-go is supposed to prevent that by caching the contents of GOMODCACHE. I think the problem is the cache is usually populated by the docs workflow, which doesn't fetch all the Go dependencies (example), so they aren't included when actions/setup-go saves the cache.
    image

Modifications

This eliminates the first two redundancies by removing the unnecessary prerequisites from the Makefile and the redundant make install step in ci-build.yaml. Also, I added make --touch dist/* to mark everything as up-to-date so the binaries aren't rebuilt (docs). For the issue with the go dependencies not being in the cache, I updated docs.yaml to run go mod download to ensure they're included.

Also, I consolidated Start controller/API and Wait for controller to be up steps so you can see everything Kit is doing in the logs for the former, which will help prevent these kind of oversights in the future. The reason I didn't catch the issue with binaries being rebuilt in #14012 is because the logs are discarded unless the build failed (in which case they were visible in the Failure debug - Controller/API logs step).

Verification

Wait for E2E tests

@MasonM MasonM force-pushed the ci-fix-redundancy branch 2 times, most recently from d50c96b to 0dd009b Compare December 23, 2024 22:13
`make install` is being redundantly run three times for each E2E test:
1. In the workflow: https://github.com/argoproj/argo-workflows/blob/6699ab396f830210f6dcac4f00a9328a629c142f/.github/workflows/ci-build.yaml#L349-L350
2. As a prerequisite of `make start`:
   https://github.com/argoproj/argo-workflows/blob/6699ab396f830210f6dcac4f00a9328a629c142f/Makefile#L547o
3. As a dependency by Kit: https://github.com/argoproj/argo-workflows/blob/6699ab396f830210f6dcac4f00a9328a629c142f/tasks.yaml#L39

Similarly, the changes in
argoproj#14012 to centralize
binary building aren't effective because the binaries are being rebuilt.

This eliminates the redundancy by removing the unnecessary prerequisites
from the `Makefile` and the redundant `make install` step in
`ci-build.yaml`. Also, I added `make --touch dist/*` to mark everything
as up-to-date so the binaries aren't rebuilts (docs:
https://www.gnu.org/software/make/manual/html_node/Instead-of-Execution.html)

Signed-off-by: Mason Malone <[email protected]>
@MasonM MasonM changed the title ci: improve performance by removing redundancy ci: improve performance by eliminating redundancies Dec 23, 2024
Signed-off-by: Mason Malone <[email protected]>
@MasonM
Copy link
Contributor Author

MasonM commented Dec 23, 2024

/retest

@MasonM MasonM marked this pull request as ready for review December 23, 2024 23:50
@tczhao
Copy link
Member

tczhao commented Dec 25, 2024

👍Would you have any metrics on how much performance this improves?

@MasonM
Copy link
Contributor Author

MasonM commented Dec 27, 2024

@tczhao The first two redundancies impact the time for the "Start controller/API" step in the E2E test jobs. The new "Performance Metrics" page can show metrics for the E2E jobs as a whole, but it doesn't look at has step-level granularity. From a very quick look, it appears it reduces the time for that step by 2-3 minutes.

The third redundancy impacts everything Go-related, and the fix in this PR to avoid re-downloading Go dependencies isn't going to help with that until the cache is updated. I think it'll be 10-20 seconds.

Copy link
Member

@tczhao tczhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tczhao tczhao merged commit 08a34ce into argoproj:main Dec 30, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants