Skip to content

Commit

Permalink
update document
Browse files Browse the repository at this point in the history
  • Loading branch information
tocean committed Feb 18, 2024
1 parent ac666ac commit 8e12fb8
Show file tree
Hide file tree
Showing 6 changed files with 24 additions and 11 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/unit-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ jobs:
# 1.14.0a0+410ce96
- torch: "1.14"
nvcr: 22.12-py3
dir: torch1
# 2.1.0a0+fe05266f
dir: ../torch1
# 2.1.0a0+32f93b1
- torch: "2.1"
nvcr: 23.10-py3
dir: torch2
dir: ../torch2
container:
image: nvcr.io/nvidia/pytorch:${{ matrix.nvcr }}
options: --privileged --ipc=host --gpus=all
Expand Down
5 changes: 5 additions & 0 deletions dockerfile/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash

ldconfig

exec "$@"
5 changes: 5 additions & 0 deletions dockerfile/torch1.14-cuda11.8.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,8 @@ RUN python3 -m pip install . && \
make postinstall

ENV LD_PRELOAD="/usr/local/lib/libmsamp_dist.so:/usr/local/lib/libnccl.so:${LD_PRELOAD}"

# Set up entrypoint
COPY dockerfile/entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
11 changes: 7 additions & 4 deletions dockerfile/torch2.1-cuda12.2.dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
FROM nvcr.io/nvidia/pytorch:23.10-py3

# Ubuntu: 22.04
# Python: 3.8
# Python: 3.10
# CUDA: 12.2.0
# cuDNN: 8.9.5
# NCCL: v2.16.2-1 + FP8 Support
# PyTorch: 2.1.0a0+fe05266f
# PyTorch: 2.1.0a0+32f93b1

LABEL maintainer="MS-AMP"

Expand Down Expand Up @@ -44,8 +44,6 @@ WORKDIR /opt/msamp
ADD third_party third_party
RUN cd third_party/msccl && \
make -j ${NUM_MAKE_JOBS} src.build NVCC_GENCODE="\
-gencode=arch=compute_70,code=sm_70 \
-gencode=arch=compute_80,code=sm_80 \
-gencode=arch=compute_90,code=sm_90" && \
make install
# cache TE build to save time in CI
Expand All @@ -57,3 +55,8 @@ RUN python3 -m pip install . && \
make postinstall

ENV LD_PRELOAD="/usr/local/lib/libmsamp_dist.so:/usr/local/lib/libnccl.so:${LD_PRELOAD}"

# Set up entrypoint
COPY dockerfile/entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
Binary file modified docs/assets/gpt-performance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions docs/getting-started/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Here're the system requirements for MS-AMP.
* CUDA version 11 or later (which can be checked by running `nvcc --version`).
* PyTorch version 1.14 or later (which can be checked by running `python -c "import torch; print(torch.__version__)"`).

You can try MS-AMP in two ways: Using Docker or installing from source:
You can try MS-AMP in two ways: Using Docker or installing from source.

* Using Docker is a convenient way to get started with MS-AMP. You can use the pre-built Docker image to quickly set up an environment for running MS-AMP.
* On the other hand, installing from source gives you more control over the installation process and allows you to customize the installation to your needs.
Expand All @@ -28,8 +28,8 @@ You can try MS-AMP in two ways: Using Docker or installing from source:
You can try the latest MS-AMP Docker container with the following commands:

```bash
sudo docker run -it -d --name=msampcu121 --privileged --net=host --ipc=host --gpus=all -v /:/hostroot ghcr.io/azure/msamp:main-cuda12.1 bash
sudo docker exec -it msampcu121 bash
sudo docker run -it -d --name=msampcu122 --privileged --net=host --ipc=host --gpus=all -v /:/hostroot ghcr.io/azure/msamp:main-cuda12.2 bash
sudo docker exec -it msampcu122 bash
```

MS-AMP is pre-installed in Docker container and you can verify it by running:
Expand All @@ -46,7 +46,7 @@ We strongly recommend using [PyTorch NGC Container](https://catalog.ngc.nvidia.c
For example, to start PyTorch 2.1 container, run the following command:

```bash
sudo docker run -it -d --name=msamp --privileged --net=host --ipc=host --gpus=all nvcr.io/nvidia/pytorch:23.04-py3 bash
sudo docker run -it -d --name=msamp --privileged --net=host --ipc=host --gpus=all nvcr.io/nvidia/pytorch:23.10-py3 bash
sudo docker exec -it msamp bash
```

Expand Down

0 comments on commit 8e12fb8

Please sign in to comment.