-
Notifications
You must be signed in to change notification settings - Fork 2.4k
NVIDIA Megatron-LM Q-a Discussions
Sort by:
Latest activity
Categories, most helpful, and community links
Categories
Community links
🙏 Q&A Discussions
Ask the community for help
-
You must be logged in to vote 🙏 [QUESTION] Why take too much time to sync up barrier information between ranks
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] In RotaryEmbedding, the datatype of inv_freq and the corresponding sin/cos computations should be maintained as torch.float32?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] why the time of one iter in nsys longer than that in the ouput log?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] What is the difference between with/without mcore model in pretrain_gpt.py?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 Does Megatron has plan to support Gemma?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 How to convert Llama-2 huggingface checkpoint to the megatron format
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] What is the retrieval datasets when evaluating downstream tasks?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] Megatron-LM installation with CUDA 11.6
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION]Why forward_backward_pipelining_without_interleaving cannot open config.overlap_p2p_comm?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] How to release the model and optimizer memory manually?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] How to set
stale--rotary-seq-len-interpolation-factor
for rope scaling?No activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] How to re-initialize process group after destroy_process_group() ?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 How to split the dataset when running pretrain_bert.py
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] Why write a special LinearWithFrozenWeight?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 question about test_global_memory_buffer
staleNo activity in 60 days on issue or PR