Skip to content

Commit

Permalink
Update workflows that use cu116 to cu117 (microsoft#5361)
Browse files Browse the repository at this point in the history
The following workflows were specifying runners with cu116, we are
updating these to cu117.

Workflows impacted:
- [x] nv-accelerate-v100
- [new
build](https://github.com/microsoft/DeepSpeed/actions/runs/8557768042/job/23450811816?pr=5361):
22 passed, 5 skipped, 11 warnings in 129.04s (0:02:09)
- [old
build](https://github.com/microsoft/DeepSpeed/actions/runs/8547131990/job/23418750315):
22 passed, 5 skipped, 11 warnings in 318.84s (0:05:18)
- [x] nv-ds-chat
- [new
build](https://github.com/microsoft/DeepSpeed/actions/runs/8546543733/job/23417119129):
15 passed, 1 skipped in 2729.91s (0:45:29)
- [old
build](https://github.com/microsoft/DeepSpeed/actions/runs/8531148226/job/23370268262):
15 passed, 1 skipped in 3511.82s (0:58:31)
- [x] nv-inference - recently failing and disabled, needs fixes.
- [new
build](https://github.com/microsoft/DeepSpeed/actions/runs/8558749560):
36 failed, 74 passed, 95 skipped, 4 warnings in 877.45s (0:14:37)
- [old
build](https://github.com/microsoft/DeepSpeed/actions/runs/8546382497/job/23416626521):
36 failed, 74 passed, 95 skipped, 4 warnings in 3633.34s (1:00:33)
- [x] nv-mii
- [new
build](https://github.com/microsoft/DeepSpeed/actions/runs/8557768075/job/23450812054?pr=5361):
4 passed, 23 deselected, 3 warnings in 116.28s (0:01:56)
- [old
build](https://github.com/microsoft/DeepSpeed/actions/runs/8547246351/job/23419064526):
4 passed, 23 deselected, 3 warnings in 196.79s (0:03:16)
- [x] nv-nightly
- [new
build](https://github.com/microsoft/DeepSpeed/actions/runs/8557763671/job/23450792634):
3 passed, 3 skipped, 4713 deselected, 1 warning in 1831.83s (0:30:31)
- [old
build](https://github.com/microsoft/DeepSpeed/actions/runs/8547230983/job/23419020962):
3 passed, 3 skipped, 4713 deselected, 1 warning in 2459.06s (0:40:59)
- [x] nv-torch-latest-v100
- [new
build](https://github.com/microsoft/DeepSpeed/actions/runs/8557768039/job/23450811779):
947 passed, 169 skipped, 4 warnings in 2550.25s (0:42:30) and 61 passed,
4 skipped, 4643 deselected, 1 warning in 563.34s (0:09:23)
- [old
build](https://github.com/microsoft/DeepSpeed/actions/runs/8547232496/job/23419024966):
947 passed, 169 skipped, 4 warnings in 3216.47s (0:53:36) and 61 passed,
4 skipped, 4643 deselected, 1 warning in 611.17s (0:10:11)
- [x] nv-torch-nightly-v100
- [new
build](https://github.com/microsoft/DeepSpeed/actions/runs/8558930744):
13 failed, 982 passed, 121 skipped, 4 warnings in 2691.26s (0:44:51)
- [old
build](https://github.com/microsoft/DeepSpeed/actions/runs/8558895638):
13 failed, 982 passed, 121 skipped, 4 warnings in 3117.03s (0:51:57)
- [x] nv-transformers-v100 - disabled for 4 months, needs work
regardless.
  • Loading branch information
loadams authored and umchand committed May 20, 2024
1 parent 7b211b6 commit f6e9adb
Show file tree
Hide file tree
Showing 8 changed files with 10 additions and 10 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/nv-accelerate-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ concurrency:

jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu116, v100]
runs-on: [self-hosted, nvidia, cu117, v100]

steps:
- uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nv-ds-chat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ permissions:

jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu116, v100]
runs-on: [self-hosted, nvidia, cu117, v100]

steps:
- uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nv-inference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ concurrency:

jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu116, v100]
runs-on: [self-hosted, nvidia, cu117, v100]

steps:
- uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nv-mii.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ concurrency:

jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu116, v100]
runs-on: [self-hosted, nvidia, cu117, v100]

steps:
- uses: actions/checkout@v3
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/nv-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ permissions:

jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu116, v100]
runs-on: [self-hosted, nvidia, cu117, v100]

steps:
- uses: actions/checkout@v3
Expand All @@ -25,7 +25,7 @@ jobs:

- name: Install pytorch
run: |
pip install -U --cache-dir $TORCH_CACHE torch==1.13.1 torchvision --index-url https://download.pytorch.org/whl/cu116
pip install -U --cache-dir $TORCH_CACHE torch==1.13.1 torchvision --index-url https://download.pytorch.org/whl/cu117
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Expand Down Expand Up @@ -55,7 +55,7 @@ jobs:
run: |
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
cd tests
pytest $PYTEST_OPTS --forked -m 'nightly' unit/ --torch_ver="1.13" --cuda_ver="11.6"
pytest $PYTEST_OPTS --forked -m 'nightly' unit/ --torch_ver="1.13" --cuda_ver="11.7"
- name: Open GitHub issue if nightly CI fails
if: ${{ failure() && (github.event_name == 'schedule') }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nv-torch-latest-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ concurrency:

jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu116, v100]
runs-on: [self-hosted, nvidia, cu117, v100]

steps:
- uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nv-torch-nightly-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ permissions:

jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu116, v100]
runs-on: [self-hosted, nvidia, cu117, v100]

steps:
- uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nv-transformers-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ concurrency:

jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu116, v100]
runs-on: [self-hosted, nvidia, cu117, v100]

steps:
- uses: actions/checkout@v3
Expand Down

0 comments on commit f6e9adb

Please sign in to comment.