Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
torch-extras
ContainerThis PR adds a new container named
ml-containers/torch-extras
, which isml-containers/torch
with supplementary libraries DeepSpeed and flash-attention.The code is originally based off of #21, but significantly more generalized and with the finetuner application-specific parts removed.
Rationale
DeepSpeed and flash-attention both require CUDA development tools to install properly. This complicates using them with anything but an
nvidia/cuda:...-devel
based image. Optionally including them with ourml-containers/torch
containers allows for still-lightweight images that can use those powerful libraries without the full CUDA development toolkit. It also reduces compile time for downstream Dockerfiles, since flash-attention takes a long time to compile at whatever step it is included.Structure
ml-containers/torch-extra
is separated out as a separate container, unlike the tag-differentiatedtorch:base
andtorch:nccl
flavours of the baseline torch image. These are simply layers on top of thetorch:base
andtorch:nccl
images, and are built as a second CI step immediately after either of those two are built.Since compatibility of DeepSpeed and flash-attention may lag behind PyTorch releases themselves, the secondary step to build these images can be temporarily disabled via flags in
torch-base.yml
andtorch-nccl.yml
until they become compatible.I welcome comments and suggestions on this build process and structure, because it requires tradeoffs. It guarantees that the
torch-extras
containers are always built, whenever possible, on newtorch
image updates, but it makes it more difficult to build thetorch-extras
containers standalone, if desired.