Skip to content

Commit

Permalink
add better instruction
Browse files Browse the repository at this point in the history
  • Loading branch information
samsja committed Sep 12, 2024
1 parent d999451 commit a5b7236
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion open_diloco/run_training.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@


# you can either pass a fixed initial peer or set it to auto and the script will start a dht server for you
## ./run_training.sh 4 1 auto --per-device-train-batch-size 8 --total-batch-size 128 --lr 1e-2 --path-model ../tests/models/llama-2m-fresh --project debug --no-torch-compile --hv.local-steps 100 --fake-data --hv.matchmaking_time 2
## ./run_training.sh 4 1 auto --per-device-train-batch-size 8 --total-batch-size 128 --lr 1e-2 --path-model ../tests/models/llama-2m-fresh --project debug --no-torch-compile --hv.local-steps 100 --fake-data --hv.matchmaking_time 2 --hv.fail_rank_drop --hv.skip_load_from_peers

# Function to get CUDA devices based on the number of GPUs and index
function get_cuda_devices() {
Expand Down

0 comments on commit a5b7236

Please sign in to comment.