-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] Build and test V2 driver / launcher images against incoming PRs #11239
Comments
Noting that this was surfaced during development of #11196. We are currently unable to add integration tests assuring the stability of this patch in that PR due to this issue. Thoughts / feedback very welcome as always! |
Information from the 9/25/24 community meeting: The current process for releases is for the person cutting the release to verify that workflows on the master branch have completed successfully. There was no mention of workflows being executed against release branches. Image updates to GCR are manually executed as well. Release is documented here. It seems that there is consensus on building images and testing them in PRs through overrides. Remaining questions are what to do with the If, for example, a PR such as #11196 was merged in with updated tests to assure that this functionality remains intact these tests will fail with older versions of the driver / launcher. So the options to keep tests passing in the
The current guidance for releases assume a pattern most similar to the second option. If the first option were taken, we would need to assure that at some point before release the overrides are disabled and workflows executed. |
Thanks for driving this @CarterFendley ! As far as the PR suggestions are concerned, I think your proposal in OP makes sense, Imo we should just go with setting the env vars.
Is this not currently the case? If the kfp-cluster action runs post-merge against master branch then build-images.sh is used there as well. So in that scenario the same env vars could be set that reference the kind images. Granted, if this is the flow then the tests can be a bit misleading since the hard coded images are not what we are testing against. What if we just reference This way you are affectively doing (1), and (2) is handled implicitly if we create automated build/push to main. We can tie this with the newly added workflow that pushes to GHCR here, have this run on PR merge with a push to |
Adding a note about the urgency of this change. #11246 introduced a bug which would have been caught by the sample tests if the driver / launcher had been built and tested. Specifically the @zazulam took a two day side quest finding a solution to the issue 😂 Will work with my team to allow me to pick this up next (hopefully tomorrow). |
@CarterFendley just to make sure, are you saying the PR introduced a bug in the main branch, or in the context of the PR? if the latter can we prioritize a fix as a separate commit/pr? Regardless, let us know what we can do to help support you in this effort, we too feel this is very important! |
@HumairAK prior to that PR |
Thanks folks @CarterFendley / @zazulam!
Yes it sounds like a separate issue which we should fix asap if possible, I've created the following issue to track it: Let us know if you guys can take this, we can assign it. |
Okay so the PR above starts work on this. Long term we should probably be using something different than Looping back to process / design comments raised by @HumairAK, I don't mind the idea of having Also, slightly curious about trying to push build-images.sh and image-builds.yml together, maybe would be nice to have one solution for this. Have already at least 4 different ways to build launcher and driver images 😅 |
Feature Area
/area backend
/area samples
What feature would you like to see?
The V2 backend driver / launcher images being built and tested against incoming PRs through integration tests / etc.
What is the use case or pain point?
Assure stability of driver / launcher.
Is there a workaround currently?
Trust people to test driver / launcher locally.
More details
Currently, the kfp-cluster action, currently used by the workflows listed below, uses build-images.sh to build a set of images and push to the kind registry.
e2e-test.yml
kfp-kubernetes-execution-tests.yml
kfp-samples.yml
<-- My primary focus at the momentkubeflow-pipelines-integration-v2.yml
periodic.yml
sdk-execution.yml
upgrade-test.yml
The set of images which are built by
build-images.sh
does not currently include the V2 driver and launcher.Even if this is changed, there would still be additional work required to assure these built images would be used by the backend during testing. Namely, the backend has defaults for which images to use (see here) which normally point to
gcr.io
locations. Work would need to be done to override these defaults so that during PR testing, the built images would be used instead of the ones deployed previously ongcr.io
.Discussion of implementation
build-image.sh
would likely be pretty straight forward.V2_DRIVER_IMAGE
/V2_LAUNCHER_IMAGE
environment variables to override thegcr.io
defaults (configured via thedeployment.apps/ml-pipeline
deployment). @hbelmiro has suggested maybe using a Kustomize layer for updating these during testing.What about releases?
Although it makes sense to build driver / launcher images and test them during the PRs it may make sense to NOT override the
V2_DRIVER_IMAGE
/V2_LAUNCHER_IMAGE
defaults and test against thegcr.io
deployments when validating releases. Since users will be unlikely to override these values and usegcr.io
it is reasonable to test in that configuration.I am not aware of the extent to which
kfp-samples.yml
(or other workflows consuming thekfp-cluster
action) are executed during release processes. Please let me know if others have more info on this :)Related slack thread: https://cloud-native.slack.com/archives/C073N7BMLB1/p1727104197895549
Love this idea? Give it a 👍.
The text was updated successfully, but these errors were encountered: