You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Enabling FSDP doesn't work with transformer_engine.pytorch.optimizers.FusedAdam or apex.optimizers.Adam, and requires torch.optim.AdamW which isn't the default in /work/nvme/bddk/prathi3/Megatron-LM/megatron/core/optimizer/__init__.py.
Expected behavior
Using transformer_engine or apex's optimizer should be disabled if FSDP is enabled
Stack trace/logs
For Apex Adam:
[rank0]: Traceback (most recent call last):
[rank0]: File "Megatron-LM/pretrain_gpt.py", line 284, in <module>
[rank0]: pretrain(
[rank0]: File "Megatron-LM/megatron/training/training.py", line 376, in pretrain
[rank0]: iteration, num_floating_point_operations_so_far = train(
[rank0]: File "Megatron-LM/megatron/training/training.py", line 1431, in train
[rank0]: train_step(forward_step_func,
[rank0]: File "Megatron-LM/megatron/training/training.py", line 775, in train_step
[rank0]: update_successful, grad_norm, num_zeros_in_grad = optimizer.step()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "Megatron-LM/megatron/core/optimizer/optimizer.py", line 473, in step
[rank0]: success = self.step_with_ready_grads()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "Megatron-LM/megatron/core/optimizer/optimizer.py", line 430, in step_with_ready_grads
[rank0]: self.optimizer.step()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 478, in wrapper
[rank0]: out = func(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/apex/optimizers/fused_adam.py", line 293, in step
[rank0]: multi_tensor_applier(self.multi_tensor_adam,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/apex/multi_tensor_apply/multi_tensor_apply.py", line 27, in __call__
[rank0]: return op(self.chunk_size,
[rank0]: RuntimeError: CUDA error: an illegal memory access was encountered
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Environment (please complete the following information):
Describe the bug
Enabling FSDP doesn't work with
transformer_engine.pytorch.optimizers.FusedAdam
orapex.optimizers.Adam
, and requirestorch.optim.AdamW
which isn't the default in/work/nvme/bddk/prathi3/Megatron-LM/megatron/core/optimizer/__init__.py
.To Reproduce
/usr/local/bin/torchrun --max_restarts 1 --nproc_per_node 2 --nnodes 1 --node_rank 0 --master_addr {} --master_port {} --start_method spawn --rdzv_backend static --rdzv_endpoint {} --rdzv_conf 'distributed_backend=nccl' pretrain_gpt.py --num-layers 16 --hidden-size 2048 --ffn-hidden-size 8192 --num-attention-heads 32 --seq-length 8192 --max-position-embeddings 8192 --swiglu --train-iters 20 --eval-iters 1 --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --context-parallel-size 1 --use-torch-fsdp2 --no-gradient-accumulation-fusion --micro-batch-size 1 --global-batch-size 2 --save-interval 21 --log-interval 1 --log-throughput --logging-level 10 --lr 0.003 --lr-decay-iters 320000 --lr-decay-style cosine --min-lr 1.0e-5 --clip-grad 0.0 --lr-warmup-fraction .01 --weight-decay 0.1 --vocab-size 128256 --bf16 --use-flash-attn --use-mcore-models --untie-embeddings-and-output-weights --position-embedding-type rope --normalization LayerNorm --disable-bias-linear
on 2 A40 GPUs from branchcore_r0.10.0
Expected behavior
Using transformer_engine or apex's optimizer should be disabled if FSDP is enabled
Stack trace/logs
For Apex Adam:
Environment (please complete the following information):
Proposed fix
Vanilla
torch.optim.AdamW
worked for me, so maybe make this the default if fsdp is enabledAdditional context
N/A
The text was updated successfully, but these errors were encountered: