Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: PyTorch Extras Container #26

Merged
merged 2 commits into from
Jul 17, 2023
Merged

Conversation

Eta0
Copy link
Collaborator

@Eta0 Eta0 commented Jun 29, 2023

torch-extras Container

This PR adds a new container named ml-containers/torch-extras, which is ml-containers/torch with supplementary libraries DeepSpeed and flash-attention.
The code is originally based off of #21, but significantly more generalized and with the finetuner application-specific parts removed.

Rationale

DeepSpeed and flash-attention both require CUDA development tools to install properly. This complicates using them with anything but an nvidia/cuda:...-devel based image. Optionally including them with our ml-containers/torch containers allows for still-lightweight images that can use those powerful libraries without the full CUDA development toolkit. It also reduces compile time for downstream Dockerfiles, since flash-attention takes a long time to compile at whatever step it is included.

Structure

ml-containers/torch-extra is separated out as a separate container, unlike the tag-differentiated torch:base and torch:nccl flavours of the baseline torch image. These are simply layers on top of the torch:base and torch:nccl images, and are built as a second CI step immediately after either of those two are built.
Since compatibility of DeepSpeed and flash-attention may lag behind PyTorch releases themselves, the secondary step to build these images can be temporarily disabled via flags in torch-base.yml and torch-nccl.yml until they become compatible.

I welcome comments and suggestions on this build process and structure, because it requires tradeoffs. It guarantees that the torch-extras containers are always built, whenever possible, on new torch image updates, but it makes it more difficult to build the torch-extras containers standalone, if desired.

Eta0 added 2 commits June 16, 2023 19:53
Based off of coreweave/ml-containers PR coreweave#21, with application-specific
parts removed, and more precompiled DeepSpeed ops and flash-attn
components included.
@Eta0 Eta0 added the enhancement New feature or request label Jun 29, 2023
@Eta0 Eta0 requested a review from wbrown June 29, 2023 16:38
@Eta0 Eta0 self-assigned this Jun 29, 2023
Copy link
Collaborator

@wbrown wbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@wbrown wbrown merged commit 949759a into coreweave:main Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants