Skip to content

Commit

Permalink
Add documentation about kicking off distributed jobs
Browse files Browse the repository at this point in the history
Signed-off-by: Dashiell Stander <[email protected]>
  • Loading branch information
dashstander committed Sep 28, 2023
1 parent a982ab0 commit e9d4bf3
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,10 @@ If using MPI then you must have:
}
```

With your environment properly set up and the correct configuration files you can use `deepy.py` like a normal python script and start (for example) a training job with:

`python3 deepy.py train.py /path/to/configs/my_model.yml`

#### SLURM

Using SLURM can be slightly more involved. There are similar aspects. You must add the following to your config:
Expand Down Expand Up @@ -167,7 +171,7 @@ export COUNT_NODE=`scontrol show hostnames "$SLURM_JOB_NODELIST" | wc -l`
./write_hostfile.sh
export DLTS_HOSTFILE=/sample/path/to/hostfiles/hosts_$SLURM_JOBID

python3 deepy.py train.py /sample/path/to/your/configs/cfg.yml
python3 deepy.py train.py /sample/path/to/your/configs/my_model.yml

```

Expand Down

0 comments on commit e9d4bf3

Please sign in to comment.