-
Notifications
You must be signed in to change notification settings - Fork 225
Using Validation Framework
Validation framework is used to smoke test and validate PyTorch and Domain libraries on both CPU and GPU machines. Linux, Windows and MacOS (x86 and Apple Silicon) are supported. Following are the requirements for the validation framework:
- Support Linux, Windows and MacOS using ephemeral runners with only minimal dependencies installed
- Support CPU and GPU runners, with older Nvidia GPU driver to provide for backward compatibility tests (our CI runs on latest Nvidia Drivers)
- Execute on nightly basis
- Surface the result on HUD
- Cover all the Released Domain Libraries
- Follow same instructions as get started page for installation
- Cover nightly, test and release channel
- Use same matrix as PyTorch Core and Nova Project, so that after new build is introduced to PyTorch Core, it should become available for validation
- Smoke test that will cover PyTorch standalone or PyTorch with all Domain Libraries
Validation framework is used in two different ways:
-
Nightly Validation of PyTorch, TorchAudio, TorchVision as one ecosystem. Using same instructions as in get started page. These workflows are implemented in validate-binaries.yml and are used for nightly and release validation by PyTorchDev infra team.
-
Standalone Domain Library validation. Currently implemented for TorchText and TorchRec domain libraries. This is completely customized way of using validation framework and in theory this approach can be used to validate any project within PyTorch organization. Please see onboarding documentation if you are interested in start using the Validation Framework.
Onboarding to validation framework is straight forward. You will need to create a new GitHub action workflow as that will call validate-domain-library.yml workflow. Here is an example taken from TorchRec repo:
name: Validate binaries
on:
workflow_call:
inputs:
channel:
description: "Channel to use (nightly, release)"
required: false
type: string
default: release
ref:
description: 'Reference to checkout, defaults to empty'
default: ""
required: false
type: string
workflow_dispatch:
inputs:
channel:
description: "Channel to use (nightly, release)"
required: true
type: choice
options:
- release
- nightly
ref:
description: 'Reference to checkout, defaults to empty'
default: ""
required: false
type: string
jobs:
validate-binaries:
uses: pytorch/builder/.github/workflows/validate-domain-library.yml@main
with:
package_type: "wheel"
os: "linux"
channel: ${{ inputs.channel }}
repository: "pytorch/torchrec"
smoke_test: "./.github/scripts/validate_binaries.sh"
with_cuda: enable
Binary build matrix contains current configuration that is supported by PyTorch core and domain libraries. Binaries build matrix is generated using the following workflow: generate_binary_build_matrix.yml. For additional details refer to documentation here
Currently Following CUDA and Python configurations are supported:
CUDA | CUDNN | additional details |
---|---|---|
11.6 | 8.3.2.44 | Stable CUDA Release |
11.7 | 8.5.0.96 | Latest CUDA Release |
11.8 | 8.5.0.96 | CUDA Release Supported on nightly |
Python versions | Package details |
---|---|
3.7 | Supported on Conda and Pip |
3.8 | Supported on Conda and Pip |
3.9 | Supported on Conda and Pip |
3.10 | Supported on Conda and Pip |
3.11 | Supported on Pip only |
The output of the Generate workflow workflow is a JSON array of entires which contain basic information needed to install and test the package. Following fields are supported:
{
"python_version": "3.7",
"gpu_arch_type": "cuda",
"gpu_arch_version": "11.7",
"desired_cuda": "cu117",
"container_image": "pytorch/manylinux-builder:cuda11.7",
"package_type": "wheel",
"build_name": "wheel-py3_7-cuda11_7",
"validation_runner": "windows.8xlarge.nvidia.gpu",
"installation": "pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu117",
"channel": "nightly",
"upload_to_base_bucket": "no",
"stable_version": "1.13.1"
}