Skip to content

Commit

Permalink
Merge branch 'master' into olruwase/no_grad_sync_ctxt
Browse files Browse the repository at this point in the history
  • Loading branch information
loadams authored Nov 12, 2024
2 parents a6d68b7 + 877aa0d commit 8ee6fbe
Show file tree
Hide file tree
Showing 6 changed files with 9 additions and 9 deletions.
4 changes: 2 additions & 2 deletions docs/_tutorials/bert-finetuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ In this tutorial we will be adding DeepSpeed to the BingBert model for the SQuAD

If you don't already have a copy of the DeepSpeed repository, please clone in
now and checkout the DeepSpeedExamples submodule the contains the BingBertSquad
example (DeepSpeedExamples/BingBertSquad) we will be going over in the rest of
example (DeepSpeedExamples/training/BingBertSquad) we will be going over in the rest of
this tutorial.

```shell
git clone https://github.com/microsoft/DeepSpeed
cd DeepSpeed
git submodule update --init --recursive
cd DeepSpeedExamples/BingBertSquad
cd DeepSpeedExamples/training/BingBertSquad
```

### Pre-requisites
Expand Down
4 changes: 2 additions & 2 deletions docs/_tutorials/onebit-adam.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ You can also use a pre-trained BERT model checkpoint from either DeepSpeed, [Hug
### 2.1 Running BingBertSQuAD with DeepSpeed and 1-bit Adam
We provide example scripts under [DeepSpeedExamples/BingBertSquad/1-bit_adam/](https://github.com/microsoft/DeepSpeedExamples/tree/master/BingBertSquad/1-bit_adam). There are 3 sets of scripts corresponding to NCCL-based implementation, MPI-based implementation on Ethernet systems, and MPI-based implementation on InfiniBand systems. For MPI-based implementation, we provide both example scripts when launching with deepspeed or mpirun.
We provide example scripts under [DeepSpeedExamples/training/BingBertSquad/1-bit_adam/](https://github.com/microsoft/DeepSpeedExamples/tree/master/training/BingBertSquad/1-bit_adam). There are 3 sets of scripts corresponding to NCCL-based implementation, MPI-based implementation on Ethernet systems, and MPI-based implementation on InfiniBand systems. For MPI-based implementation, we provide both example scripts when launching with deepspeed or mpirun.
<!-- The main part of training is done in `nvidia_run_squad_deepspeed.py`, which has
already been modified to use DeepSpeed. The `run_squad_deepspeed.sh` script
Expand All @@ -157,7 +157,7 @@ To enable the 1-bit compressed training, 1-bit Adam uses an MPI library (E.g. MV
### Launch with deepspeed
The following helper script in the DeepSpeedExamples/BingBertSQuAD will launch the training without the need for setting any `mpirun` parameters. The number of nodes and GPUs will be automatically detected and the job will be launched on all the available resources.
The following helper script in the DeepSpeedExamples/training/BingBertSQuAD will launch the training without the need for setting any `mpirun` parameters. The number of nodes and GPUs will be automatically detected and the job will be launched on all the available resources.
```shell
bash run_squad_deepspeed_onebitadam.sh <PATH_TO_OUTPUT_DIR>
Expand Down
2 changes: 1 addition & 1 deletion tests/model/BingBertSquad/run_BingBertSquad.sh
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ done

# Validate path to BingBertSquad script
if [ -z "${BingBertSquad_DIR+x}" ]; then
export BingBertSquad_DIR=../../../../DeepSpeedExamples/BingBertSquad
export BingBertSquad_DIR=../../../../DeepSpeedExamples/training/BingBertSquad
echo "BingBertSquad_DIR environment variable not set; trying default: ${BingBertSquad_DIR}"
fi
validate_folder ${BingBertSquad_DIR} "BingBertSquad_DIR"
Expand Down
2 changes: 1 addition & 1 deletion tests/model/BingBertSquad/run_BingBertSquad_sanity.sh
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ done

# Validate path to BingBertSquad script
if [ -z "${BingBertSquad_DIR+x}" ]; then
export BingBertSquad_DIR=../../../DeepSpeedExamples/BingBertSquad
export BingBertSquad_DIR=../../../DeepSpeedExamples/training/BingBertSquad
echo "BingBertSquad_DIR environment variable not set; trying default: ${BingBertSquad_DIR}"
fi
validate_folder ${BingBertSquad_DIR} "BingBertSquad_DIR"
Expand Down
2 changes: 1 addition & 1 deletion tests/model/BingBertSquad/run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ validate_folder() {

# Validate path to BingBertSquad script
if [ -z "${BingBertSquad_DIR+x}" ]; then
export BingBertSquad_DIR=../../../DeepSpeedExamples/BingBertSquad
export BingBertSquad_DIR=../../../DeepSpeedExamples/training/BingBertSquad
echo "BingBertSquad_DIR environment variable not set; trying default: ${BingBertSquad_DIR}"
fi
validate_folder ${BingBertSquad_DIR} "BingBertSquad_DIR"
Expand Down
4 changes: 2 additions & 2 deletions tests/model/BingBertSquad/test_e2e_squad.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@
import pytest
import json

sys.path.append("../../../DeepSpeedExamples/BingBertSquad")
sys.path.append("../../../DeepSpeedExamples/training/BingBertSquad")
import evaluate as eval

squad_dir = "/data/BingBertSquad"
base_dir = "../../../DeepSpeedExamples/BingBertSquad"
base_dir = "../../../DeepSpeedExamples/training/BingBertSquad"

script_file_name = "run_squad_deepspeed.sh"
model_file_name = "training_state_checkpoint_162.tar"
Expand Down

0 comments on commit 8ee6fbe

Please sign in to comment.