Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): Refactor API server to introduce support for Docker image registries and s3 bucket support #394

Merged

Conversation

deadlycoconuts
Copy link
Contributor

@deadlycoconuts deadlycoconuts commented Sep 18, 2024

Note 🚨

This PR should not be merged without the changes in caraml-dev/mlp#116 being merged, published and imported as dependencies first. This PR is currently only using a branch of a fork (the source of the quoted PR) of the MLP repository.

Context

In order to provide support for using image registries that use Docker registry credentials as well as AWS S3-based blob storage services, this PR refactors the API server to support these additional image registry and bob storage options. More concretely, these are the following changes made:

  • Set up the workflow needed to allow platform maintainers to configure the image registry that the API server will push images to (Docker or Google Cloud/Artifact Registry) as well as the blob storage service that it should read ensembler artifacts from/write files to (S3-based store/Google Cloud Storage)
  • Allow the API server to access a configured Docker registry to check if an image is available
  • Allow Kaniko jobs spun up by the API server to use load ensembler artifacts from a configured S3-based store when building model images
  • Allow Kaniko jobs spun up by the API server to push images to the configured Docker registry

Modifications

  • api/turing/api/appcontext.go - Add additional argument to be passed to the job builder so that it knows what file blob type storage it needs to use
  • api/turing/config/config.go - Introduce new configs for platform maintainers to specify the PushRegistryType and the DockerCredentialSecretName
  • api/turing/imagebuilder/imagebuilder.go - Make changes to the image builder to configure the Kaniko job spec correctly depending on the selected registry type and blob storage type
  • engines/pyfunc-ensembler-job/Dockerfile - Add steps to the base batch ensembler image to install the AWS CLI
  • engines/pyfunc-ensembler-job/app.Dockerfile - Add steps to the batch ensembler docker image to authenticate and pull ensembler artifacts correctly depending on the configured blob storage type
  • engines/pyfunc-ensembler-service/Dockerfile - Add steps to the base real-time ensembler image to install the AWS CLI
  • engines/pyfunc-ensembler-service/app.Dockerfile - Add steps to the real-time ensembler docker image to authenticate and pull ensembler artifacts correctly depending on the configured blob storage type
  • infra/charts/turing/values.yaml - Added configs needed for the e2e test in the CICD job

Some tiny caveats found

  • Adding the AWS CLI and building an ensembler image using Kaniko makes Kaniko OOM is it only has 4GB of memory set as its limits
  • The current naming convention of ensembler images (see this and this), with its slashes /, seems to not work when pushing to Docker Hub, as Docker Hub doesn't seem to accept slashes in a repository (image)'s name.

@deadlycoconuts deadlycoconuts added the enhancement New feature or request label Sep 18, 2024
@deadlycoconuts deadlycoconuts self-assigned this Sep 18, 2024
Copy link

codecov bot commented Sep 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.92%. Comparing base (1c83ad3) to head (1c6263a).
Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #394   +/-   ##
=======================================
  Coverage   95.92%   95.92%           
=======================================
  Files          25       25           
  Lines        2035     2035           
=======================================
  Hits         1952     1952           
  Misses         83       83           
Flag Coverage Δ
sdk-test-3.10 95.92% <ø> (ø)
sdk-test-3.8 95.92% <ø> (ø)
sdk-test-3.9 95.92% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@bthari bthari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR and the detailed changelog! A bit of question, you mentioned that Docker Hub doesn’t support image name with slash, are we planning to change the name format later, or what's the plan for it?

api/turing/config/config.go Show resolved Hide resolved
@deadlycoconuts
Copy link
Contributor Author

A bit of question, you mentioned that Docker Hub doesn’t support image name with slash, are we planning to change the name format later, or what's the plan for it?

@bthari hmm I'm thinking/hoping that it might just be a problem specific to Docker Hub but if we aren't pushing to Docker Hub then it should be okay. However, I think a more comprehensive solution would be to allow the naming convention to be configurable on a platform-level (via the API server configs) but that might need a little more time to expose as a config.

@deadlycoconuts
Copy link
Contributor Author

Thanks a lot for the comments @tiopramayudi and @bthari ! I should've covered and addressed all of the outstanding comments. If all looks good feel free to approve it again otherwise let me know what additional changes we should include and I'll get back to it. Thanks once again!

Copy link
Contributor

@tiopramayudi tiopramayudi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding one more feedback, the rest is LGTM! Feel free to merge this, thanks @deadlycoconuts !

api/turing/imagebuilder/imagebuilder.go Outdated Show resolved Hide resolved
engines/pyfunc-ensembler-job/Dockerfile Show resolved Hide resolved
@deadlycoconuts
Copy link
Contributor Author

Thanks for the re-review @tiopramayudi! I'll merge this soon! :D

@deadlycoconuts deadlycoconuts merged commit 100c478 into caraml-dev:main Nov 7, 2024
29 checks passed
@deadlycoconuts deadlycoconuts deleted the refactor_away_google_deps branch November 11, 2024 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants