Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use python slim instead of Cuda base image #1205

Closed
wants to merge 4 commits into from

Conversation

technillogue
Copy link
Contributor

One weird trick to save 20GB of user bandwidth and cut our cold start times in half.

I'm not sure how to set up the config to default to false though.

technillogue and others added 4 commits July 12, 2023 16:50
Signed-off-by: technillogue <[email protected]>
Signed-off-by: technillogue <[email protected]>
Signed-off-by: Mattt Zmuda <[email protected]>
@andreasjansson
Copy link
Member

Defaulting use_cuda_base_image to true means that we're completely backwards compatible but people might not discover that they can get a much smaller image.

Defaulting to false means that some high percentage of users will get smaller images, but for some low percentage they'll get a really hard-to-debug error message.

This is tricky... We could improve those percentages by magically setting use_cuda_base_image based on presence of tensorflow or jax, but that makes the hard-to-debug error message even more opaque for people who actually need the cuda devel base image.

Unrelated, but we've also had people ask to use custom base images. I'm not sure if we should encourage that since it most likely creates even harder-to-debug errors.

@mattt
Copy link
Contributor

mattt commented Jul 13, 2023

@technillogue Thanks so much for putting this together. I'm encouraged by how far we can get by moving away from CUDA dev base images.

I originally imagined this as a build flag (cog build --use-cuda-base-image=false). Making that work is more involved, but may be a better fit. mattt/use-cuda-base-image-flag

Expanding on what @andreasjansson said, and following up on our conversation yesterday, I think we should start by implementing this as an opt-in feature. That solves an immediate need we have, and gives us a good opportunity to continue validating this functionality without the risk of breaking existing Cog models.

At the same time, this should be a big improvement for a majority of users, and it'd be nice to migrate everyone to this sooner than later. I wrote up an idea for how we could solve that here: #1207

@technillogue
Copy link
Contributor Author

mattt/use-cuda-base-image-flag looks really good, I would be really happy to have that while we figure out the right way to handle #1207. I hadn't thought about what to do with the cuda and cudnn options. We might still need cuda, separately from the base image, for picking the right torch for users (since we're relying on torch's CUDA libraries).

Like we discussed, we'd like to also support custom dockerfiles that we add labels to, but not custom base images that might not have apt.

@bfirsh
Copy link
Member

bfirsh commented Jul 14, 2023

Could this be detected automatically? Andreas hints at a hard to debug error message. Could we automatically detect that error message and hint at the solution?

I like the idea of trying to do as much automatically as possible, then giving users a way to bail out if they need to.

@mattt
Copy link
Contributor

mattt commented Jul 27, 2023

#1214 implements this as a --use-cuda-base-image build flag that defaults to "auto", which we currently take to be true in all cases.

@bfirsh, to your point, we can absolutely iterate towards making automatic behavior better. We should eventually have it so that "auto" implies false in most cases, except the ones @andreasjansson alluded to where there are known issues.

@technillogue has a PR to allow users to specify a custom base image in #1229.

@mattt mattt closed this Jul 27, 2023
@mattt mattt deleted the syl/slim-base-image branch July 31, 2023 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants