Skip to content

Commit

Permalink
additional checks
Browse files Browse the repository at this point in the history
  • Loading branch information
AI-WAIFU committed Sep 24, 2024
1 parent 4022bd9 commit 2959091
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 0 deletions.
2 changes: 2 additions & 0 deletions configs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@ These can be set to any integer between `0` and `num_gpus`, and `num_gpus` must
# this should provide some speedup but takes a while to build, set to true if desired
"scaled_upper_triang_masked_softmax_fusion": false,
"train_iters": 320000,
# alternatively, use train_epochs to automatically determine the number of training iterations
#"train_epochs": 1,
```
An example of some basic settings used to configure your model's architecture and number of training steps.

Expand Down
9 changes: 9 additions & 0 deletions configs/neox_arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -1928,6 +1928,15 @@ Training Arguments
- **train_epochs**: int
Default = None
Number of epochs to run for training. Do not specify both train_epochs and train_iters.
Not currently compatible with data reweighing, pairwise datasets, and packing other than 'packed'
- **eval_iters**: int
Default = 100
Expand Down
3 changes: 3 additions & 0 deletions megatron/data/data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -494,6 +494,9 @@ def validate_train_epochs(neox_args):

if neox_args.train_data_weights and (not all(weight == 1.0 for weight in neox_args.train_data_weights)):
raise ValueError("train_data_weights != None is currently unsupported with train_epochs")

if neox_args.dataset_impl != "gpt2":
raise ValueError("non gpt2 datasets are not currently unsupported with train_epochs")


def build_train_valid_test_data_loaders(neox_args):
Expand Down
1 change: 1 addition & 0 deletions megatron/neox_arguments/neox_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -1193,6 +1193,7 @@ class NeoXArgsTraining(NeoXArgsTemplate):
train_epochs: int = None
"""
Number of epochs to run for training. Do not specify both train_epochs and train_iters.
Not currently compatible with data reweighing, pairwise datasets, and packing other than 'packed'
"""

eval_iters: int = 100
Expand Down

0 comments on commit 2959091

Please sign in to comment.