From 4a34e0a565f19a8578210654afcb3bb835fcc35e Mon Sep 17 00:00:00 2001 From: jaimemcc <99298642+jaimemcc-intel@users.noreply.github.com> Date: Wed, 5 Jun 2024 15:26:54 -0700 Subject: [PATCH] init changes to README (#1232) * init changes to README * Update NeoXArgs docs automatically * Update README.md * Update NeoXArgs docs automatically * Update README.md * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions Co-authored-by: Quentin Anthony --- configs/neox_arguments.md | 2 +- tests/README.md | 77 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 77 insertions(+), 2 deletions(-) diff --git a/configs/neox_arguments.md b/configs/neox_arguments.md index c60d1e15f..f6c3ecde3 100644 --- a/configs/neox_arguments.md +++ b/configs/neox_arguments.md @@ -111,7 +111,7 @@ Logging Arguments - **git_hash**: str - Default = 516169c + Default = 7aa0074 current git hash of repository diff --git a/tests/README.md b/tests/README.md index 316096cc5..390a52898 100644 --- a/tests/README.md +++ b/tests/README.md @@ -32,7 +32,7 @@ pytest --forked tests/model/test_model_generation.py Some tests can run on cpu only. These are marked with the decorator @pytest.mark.cpu. The test cases for cpu can be run with: -```` +``` pytest tests -m cpu ``` @@ -49,3 +49,78 @@ if You see this kind of error: RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method ``` It usually means that you used some pytorch.cuda function before the test creates the processes. However just importing `from torch.utils import cpp_extension` can also trigger this. + + +## CPU Test Integration + +Tests can be run against physical CPUs through GitHub Actions. To have tests run on the physical CPU test, here is generally how the CI should be written: + +### runs-on + +The CI needs to be written to target the CPU Github Action runner. The jobs that need to run on CPU should use the hardware runner's labels: +```yaml +jobs: + cpu-test-job: + runs-on: [ 'self-hosted', 'aws', 'test'] # these labels tell GitHub to execute on the runner with the 'aws' and 'test' labels +``` + +### Software dependencies + +Hardware tests that need python and docker should install them as part of the test execution to make sure the tests run as expected: +```yaml +steps: + # sample syntax to setup python with pip + - uses: actions/setup-python@v4 + with: + python-version: "3.8" + cache: "pip" + + # sample setup of docker (there's no official Docker setup action) + - name: Docker setup + run: | # taken from Docker's installation page: https://docs.docker.com/engine/install/ubuntu/ + # Add Docker's official GPG key: + sudo apt-get update + sudo apt-get install ca-certificates curl + sudo install -m 0755 -d /etc/apt/keyrings + sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc + sudo chmod a+r /etc/apt/keyrings/docker.asc + # Add the repository to Apt sources: + echo \ + "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ + $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ + sudo tee /etc/apt/sources.list.d/docker.list > /dev/null + sudo apt-get update + sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y +``` + +Any other software dependencies should be assumed to be missing and installed as part of the CI. + +### Using Docker image + +Using the Docker image and running tests in a container is recommended to resolve environment issues. There is a modified docker-compose.yml in tests/cpu_tests directory that is recommended to be used for CPU tests: + +```bash +cp tests/cpu_tests/docker-compose.yml . +# export any env variables here that should be used: +export NEOX_DATA_PATH='./data/enwik8' +docker compose run -d --build --name $CONTAINER gpt-neox tail -f /dev/null +# then can set up and run tests in the container using docker exec +docker exec $CONTAINER pip install -r /workspace/requirements-dev.txt +# etc. +# please clean up the container as part of the CI: +docker rm $CONTAINER +``` + +At the time of writing there is no built-in method to provide an offline-built Docker image to `jobs..container`. + +### Using existing CPU test CI + +There is an existing CPU test workflow that can be included in existing CI: + +```yaml +steps: + - name: Run CPU Tests + uses: + target_test_ref: $GITHUB_REF # replace with the ref/SHA that the tests should be run on + # have a look at the reusable workflow here: https://github.com/EleutherAI/gpt-neox/blob/main/tests/cpu_tests/action.yml +```