Skip to content

Commit

Permalink
Merge branch 'main' into will/bump_aws_ofi_nccl
Browse files Browse the repository at this point in the history
  • Loading branch information
willgleich authored Aug 21, 2024
2 parents 4fe3ea7 + dec879e commit 3d67422
Show file tree
Hide file tree
Showing 98 changed files with 962 additions and 985 deletions.
6 changes: 3 additions & 3 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ Example:
-->

# Before submitting
- [ ] Have you read the [contributor guidelines](https://github.com/mosaicml/composer/blob/dev/CONTRIBUTING.md)?
- [ ] Have you read the [contributor guidelines](https://github.com/mosaicml/composer/blob/main/CONTRIBUTING.md)?
- [ ] Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
- [ ] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
- [ ] Did you update any related docs and document your change?
- [ ] Did you update any related tests and add any new tests related to your change? (see [testing](https://github.com/mosaicml/composer/blob/dev/CONTRIBUTING.md#running-tests))
- [ ] Did you update any related tests and add any new tests related to your change? (see [testing](https://github.com/mosaicml/composer/blob/main/CONTRIBUTING.md#running-tests))
- [ ] Did you run the tests locally to make sure they pass?
- [ ] Did you run `pre-commit` on your change? (see the `pre-commit` section of [prerequisites](https://github.com/mosaicml/composer/blob/dev/CONTRIBUTING.md#prerequisites))
- [ ] Did you run `pre-commit` on your change? (see the `pre-commit` section of [prerequisites](https://github.com/mosaicml/composer/blob/main/CONTRIBUTING.md#prerequisites))

<!--
Thanks so much for contributing to composer! We really appreciate it :)
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/code-quality.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ name: Code Quality Checks
on:
push:
branches:
- dev
- main
- release/**
pull_request:
Expand All @@ -19,6 +18,7 @@ jobs:
code-quality:
runs-on: ubuntu-20.04
timeout-minutes: 15
if: github.repository_owner == 'mosaicml'
strategy:
matrix:
python_version:
Expand Down
72 changes: 34 additions & 38 deletions .github/workflows/daily.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ on:
- cron: "30 2 * * *" # 2:30 every day
push:
branches:
- dev
- main
- release/**
workflow_dispatch:
Expand All @@ -18,53 +17,53 @@ jobs:
strategy:
matrix:
include:
- name: cpu-3.10-2.1
container: mosaicml/pytorch:2.1.2_cpu-python3.10-ubuntu20.04
markers: not daily and (remote or not remote) and not gpu and not doctest
pytest_command: coverage run -m pytest
composer_package_name: mosaicml
- name: cpu-3.11-2.2
container: mosaicml/pytorch:2.2.1_cpu-python3.11-ubuntu20.04
markers: not daily and (remote or not remote) and not gpu and not doctest
pytest_command: coverage run -m pytest
composer_package_name: mosaicml
- name: cpu-3.11-2.2-composer
container: mosaicml/pytorch:2.2.1_cpu-python3.11-ubuntu20.04
markers: not daily and (remote or not remote) and not gpu and not doctest
pytest_command: coverage run -m pytest
composer_package_name: composer
- name: cpu-3.11-2.3
container: mosaicml/pytorch:2.3.1_cpu-python3.11-ubuntu20.04
markers: not daily and (remote or not remote) and not gpu and not doctest
pytest_command: coverage run -m pytest
composer_package_name: mosaicml
- name: cpu-3.11-2.4
container: mosaicml/pytorch:2.4.0_cpu-python3.11-ubuntu20.04
markers: not daily and (remote or not remote) and not gpu and not doctest
pytest_command: coverage run -m pytest
composer_package_name: mosaicml
- name: cpu-3.11-2.4-composer
container: mosaicml/pytorch:2.4.0_cpu-python3.11-ubuntu20.04
markers: not daily and (remote or not remote) and not gpu and not doctest
pytest_command: coverage run -m pytest
composer_package_name: composer
- name: cpu-doctest
container: mosaicml/pytorch:2.1.2_cpu-python3.10-ubuntu20.04
container: mosaicml/pytorch:2.4.0_cpu-python3.11-ubuntu20.04
markers: not daily and (remote or not remote) and not gpu and doctest
pytest_command: coverage run -m pytest tests/test_docs.py
composer_package_name: mosaicml
- name: daily-cpu-3.10-2.1
container: mosaicml/pytorch:2.1.2_cpu-python3.10-ubuntu20.04
- name: daily-cpu-3.11-2.2
container: mosaicml/pytorch:2.2.1_cpu-python3.11-ubuntu20.04
markers: daily and (remote or not remote) and not gpu and not doctest
pytest_command: coverage run -m pytest
composer_package_name: mosaicml
- name: daily-cpu-3.11-2.2
container: mosaicml/pytorch:2.2.1_cpu-python3.11-ubuntu20.04
- name: daily-cpu-3.11-2.3
container: mosaicml/pytorch:2.3.1_cpu-python3.11-ubuntu20.04
markers: daily and (remote or not remote) and not gpu and not doctest
pytest_command: coverage run -m pytest
composer_package_name: mosaicml
- name: daily-cpu-3.11-2.2-composer
container: mosaicml/pytorch:2.2.1_cpu-python3.11-ubuntu20.04
- name: daily-cpu-3.11-2.4
container: mosaicml/pytorch:2.4.0_cpu-python3.11-ubuntu20.04
markers: daily and (remote or not remote) and not gpu and not doctest
pytest_command: coverage run -m pytest
composer_package_name: composer
- name: daily-cpu-3.11-2.3-composer
container: mosaicml/pytorch:2.3.1_cpu-python3.11-ubuntu20.04
composer_package_name: mosaicml
- name: daily-cpu-3.11-2.4-composer
container: mosaicml/pytorch:2.4.0_cpu-python3.11-ubuntu20.04
markers: daily and (remote or not remote) and not gpu and not doctest
pytest_command: coverage run -m pytest
composer_package_name: composer
- name: daily-cpu-doctest
container: mosaicml/pytorch:2.2.1_cpu-python3.11-ubuntu20.04
container: mosaicml/pytorch:2.4.0_cpu-python3.11-ubuntu20.04
markers: daily and (remote or not remote) and not gpu and doctest
pytest_command: coverage run -m pytest tests/test_docs.py
composer_package_name: mosaicml
Expand All @@ -77,13 +76,10 @@ jobs:
pytest-command: ${{ matrix.pytest_command }}
pytest-markers: ${{ matrix.markers }}
composer_package_name: ${{ matrix.composer_package_name }}
pytest-wandb-entity: "mosaicml-public-integration-tests"
pytest-wandb-project: "integration-tests-${{ github.sha }}"
safe_directory: composer
secrets:
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
wandb-api-key: ${{ secrets.WANDB_API_KEY }}
code-eval-device: ${{ secrets.CODE_EVAL_DEVICE }}
code-eval-url: ${{ secrets.CODE_EVAL_URL }}
code-eval-apikey: ${{ secrets.CODE_EVAL_APIKEY }}
Expand All @@ -106,12 +102,6 @@ jobs:
# Unlike CPU tests, we run daily tests together with GPU tests to minimize launch time
# on MCLOUD and not eat up all GPUs at once
include:
- name: "gpu-3.10-2.1-1-gpu"
container: mosaicml/pytorch:2.1.2_cu121-python3.10-ubuntu20.04
markers: "(daily or not daily) and (remote or not remote) and gpu and (doctest or not doctest)"
pytest_command: "coverage run -m pytest"
composer_package_name: "mosaicml"
gpu_num: 1
- name: "gpu-3.11-2.2-1-gpu"
container: mosaicml/pytorch:2.2.1_cu121-python3.11-ubuntu20.04
markers: "(daily or not daily) and (remote or not remote) and gpu and (doctest or not doctest)"
Expand All @@ -124,12 +114,12 @@ jobs:
pytest_command: "coverage run -m pytest"
composer_package_name: "mosaicml"
gpu_num: 1
- name: "gpu-3.10-2.1-2-gpu"
container: mosaicml/pytorch:2.1.2_cu121-python3.10-ubuntu20.04
- name: "gpu-3.11-2.4-1-gpu"
container: mosaicml/pytorch:2.4.0_cu124-python3.11-ubuntu20.04
markers: "(daily or not daily) and (remote or not remote) and gpu and (doctest or not doctest)"
pytest_command: "coverage run -m pytest"
composer_package_name: "mosaicml"
gpu_num: 2
gpu_num: 1
- name: "gpu-3.11-2.2-2-gpu"
container: mosaicml/pytorch:2.2.1_cu121-python3.11-ubuntu20.04
markers: "(daily or not daily) and (remote or not remote) and gpu and (doctest or not doctest)"
Expand All @@ -142,12 +132,12 @@ jobs:
pytest_command: "coverage run -m pytest"
composer_package_name: "mosaicml"
gpu_num: 2
- name: "gpu-3.10-2.1-4-gpu"
container: mosaicml/pytorch:2.1.2_cu121-python3.10-ubuntu20.04
- name: "gpu-3.11-2.4-2-gpu"
container: mosaicml/pytorch:2.4.0_cu124-python3.11-ubuntu20.04
markers: "(daily or not daily) and (remote or not remote) and gpu and (doctest or not doctest)"
pytest_command: "coverage run -m pytest"
composer_package_name: "mosaicml"
gpu_num: 4
gpu_num: 2
- name: "gpu-3.11-2.2-4-gpu"
container: mosaicml/pytorch:2.2.1_cu121-python3.11-ubuntu20.04
markers: "(daily or not daily) and (remote or not remote) and gpu and (doctest or not doctest)"
Expand All @@ -160,6 +150,12 @@ jobs:
pytest_command: "coverage run -m pytest"
composer_package_name: "mosaicml"
gpu_num: 4
- name: "gpu-3.11-2.4-4-gpu"
container: mosaicml/pytorch:2.4.0_cu124-python3.11-ubuntu20.04
markers: "(daily or not daily) and (remote or not remote) and gpu and (doctest or not doctest)"
pytest_command: "coverage run -m pytest"
composer_package_name: "mosaicml"
gpu_num: 4
name: ${{ matrix.name }}
if: github.repository_owner == 'mosaicml'
with:
Expand All @@ -171,7 +167,7 @@ jobs:
pip_deps: "[all]"
pytest-command: ${{ matrix.pytest_command }}
pytest-markers: ${{ matrix.markers }}
python-version: 3.9
python-version: 3.11
gpu_num: ${{ matrix.gpu_num }}
gha-timeout: 5400
secrets:
Expand Down
32 changes: 19 additions & 13 deletions .github/workflows/docker-configure-build-push.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Docker Image Configure-Build-Push
name: Docker/GHCR Image Configure-Build-Push
on:
workflow_call:
inputs:
Expand All @@ -23,6 +23,9 @@ on:
staging-repo:
required: false
type: string
ghcr-staging-repo:
required: false
type: string
tags:
required: true
type: string
Expand All @@ -34,18 +37,14 @@ on:
required: true
password:
required: true
ghcr_username:
required: true
ghcr_password:
required: true
jobs:
configure-build-push:
runs-on: ubuntu-latest
runs-on: mosaic-4wide
steps:
- name: Maximize Build Space on Worker
uses: easimon/maximize-build-space@v4
with:
overprovision-lvm: true
remove-dotnet: true
remove-android: true
remove-haskell: true

- name: Checkout
uses: actions/checkout@v3

Expand All @@ -60,7 +59,12 @@ jobs:
with:
username: ${{ secrets.username }}
password: ${{ secrets.password }}

- name: Login to GHCR
uses: docker/login-action@v3
with:
username: ${{ secrets.ghcr_username }}
password: ${{ secrets.ghcr_password }}
registry: ghcr.io
- name: Calculate Docker Image Variables
run: |
set -euo pipefail
Expand All @@ -70,7 +74,8 @@ jobs:
###################
if [ "${{ inputs.staging }}" = "true" ]; then
STAGING_REPO=${{ inputs.staging-repo }}
IMAGE_TAG=${STAGING_REPO}:${{ inputs.image-uuid }}
GHCR_STAGING_REPO=${{ inputs.ghcr-staging-repo }}
IMAGE_TAG=${STAGING_REPO}:${{ inputs.image-uuid }},${GHCR_STAGING_REPO}:${{ inputs.image-uuid }}
IMAGE_CACHE="${STAGING_REPO}:${{ inputs.image-name }}-buildcache"
else
IMAGE_TAG=${{ inputs.tags }}
Expand All @@ -81,7 +86,8 @@ jobs:
echo "IMAGE_CACHE=${IMAGE_CACHE}" >> ${GITHUB_ENV}
- name: IMAGE_TAG = ${{ env.IMAGE_TAG }}
run: echo ${{ env.IMAGE_TAG }}
run: |
echo ${{ env.IMAGE_TAG }}
- name: Build and Push the Docker Image
uses: docker/build-push-action@v3
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/pr-cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,6 @@ jobs:
strategy:
matrix:
include:
- name: cpu-3.10-2.1
container: mosaicml/pytorch:2.1.2_cpu-python3.10-ubuntu20.04
markers: not daily and not remote and not gpu and not doctest
pytest_command: coverage run -m pytest
- name: cpu-3.11-2.2
container: mosaicml/pytorch:2.2.1_cpu-python3.11-ubuntu20.04
markers: not daily and not remote and not gpu and not doctest
Expand All @@ -25,8 +21,12 @@ jobs:
container: mosaicml/pytorch:2.3.1_cpu-python3.11-ubuntu20.04
markers: not daily and not remote and not gpu and not doctest
pytest_command: coverage run -m pytest
- name: cpu-3.11-2.4
container: mosaicml/pytorch:2.4.0_cpu-python3.11-ubuntu20.04
markers: not daily and not remote and not gpu and not doctest
pytest_command: coverage run -m pytest
- name: cpu-doctest
container: mosaicml/pytorch:2.3.1_cpu-python3.11-ubuntu20.04
container: mosaicml/pytorch:2.4.0_cpu-python3.11-ubuntu20.04
markers: not daily and not remote and not gpu and doctest
pytest_command: coverage run -m pytest tests/test_docs.py
name: ${{ matrix.name }}
Expand Down
8 changes: 5 additions & 3 deletions .github/workflows/pr-docker.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
name: PR Docker
name: PR Docker/GHCR
on:
pull_request:
branches:
- dev
- main
- release/**
paths:
Expand All @@ -17,7 +16,7 @@ defaults:
jobs:
build-image-matrix:
if: github.repository_owner == 'mosaicml'
runs-on: ubuntu-latest
runs-on: linux-ubuntu-latest
timeout-minutes: 2
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
Expand Down Expand Up @@ -65,8 +64,11 @@ jobs:
push: true
staging: true
staging-repo: mosaicml/ci-staging
ghcr-staging-repo: ghcr.io/databricks-mosaic/ci-staging
tags: ${{ matrix.TAGS }}
target: ${{ matrix.TARGET }}
secrets:
username: ${{ secrets.DOCKER_HUB_USERNAME }}
password: ${{ secrets.DOCKER_HUB_PASSWORD }}
ghcr_username: ${{ secrets.GHCR_USERNAME }}
ghcr_password: ${{ secrets.GHCR_TOKEN }}
20 changes: 10 additions & 10 deletions .github/workflows/pr-gpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ on:
# or dev
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' && github.ref != 'refs/heads/dev' }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
jobs:
pytest-gpu-1:
uses: mosaicml/ci-testing/.github/workflows/[email protected]
strategy:
matrix:
include:
- name: gpu-3.11-2.3-1
container: mosaicml/pytorch:2.3.1_cu121-python3.11-ubuntu20.04
- name: gpu-3.11-2.4-1
container: mosaicml/pytorch:2.4.0_cu124-python3.11-ubuntu20.04
markers: not daily and not remote and gpu and (doctest or not doctest)
pytest_command: coverage run -m pytest
composer_package_name: mosaicml
Expand All @@ -29,7 +29,7 @@ jobs:
pip_deps: "[all]"
pytest-command: ${{ matrix.pytest_command }}
pytest-markers: ${{ matrix.markers }}
python-version: 3.9
python-version: 3.11
gpu_num: 1
secrets:
mcloud-api-key: ${{ secrets.MCLOUD_API_KEY }}
Expand All @@ -39,8 +39,8 @@ jobs:
strategy:
matrix:
include:
- name: gpu-3.11-2.3-2
container: mosaicml/pytorch:2.3.1_cu121-python3.11-ubuntu20.04
- name: gpu-3.11-2.4-2
container: mosaicml/pytorch:2.4.0_cu124-python3.11-ubuntu20.04
markers: not daily and not remote and gpu and (doctest or not doctest)
pytest_command: coverage run -m pytest
composer_package_name: mosaicml
Expand All @@ -55,7 +55,7 @@ jobs:
pip_deps: "[all]"
pytest-command: ${{ matrix.pytest_command }}
pytest-markers: ${{ matrix.markers }}
python-version: 3.9
python-version: 3.11
gpu_num: 2
secrets:
mcloud-api-key: ${{ secrets.MCLOUD_API_KEY }}
Expand All @@ -66,8 +66,8 @@ jobs:
strategy:
matrix:
include:
- name: gpu-3.11-2.3-4
container: mosaicml/pytorch:2.3.1_cu121-python3.11-ubuntu20.04
- name: gpu-3.11-2.4-4
container: mosaicml/pytorch:2.4.0_cu124-python3.11-ubuntu20.04
markers: not daily and not remote and gpu and (doctest or not doctest)
pytest_command: coverage run -m pytest
composer_package_name: mosaicml
Expand All @@ -82,7 +82,7 @@ jobs:
pip_deps: "[all]"
pytest-command: ${{ matrix.pytest_command }}
pytest-markers: ${{ matrix.markers }}
python-version: 3.9
python-version: 3.11
gpu_num: 4
secrets:
mcloud-api-key: ${{ secrets.MCLOUD_API_KEY }}
Loading

0 comments on commit 3d67422

Please sign in to comment.