-
Notifications
You must be signed in to change notification settings - Fork 2.4k
NVIDIA Megatron-LM Q-a Discussions
Sort by:
Latest activity
Categories, most helpful, and community links
Categories
Community links
🙏 Q&A Discussions
Ask the community for help
-
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION] Why not use tensor parallel APIs of pytorch
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] how to profile bubble time in pipeline parallelism?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] How does tensor_parallel coop with q/k_layernorm
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION] Why is expert parallelism not supported during fp16 training?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] Does Megatron-Core supports LLAMA models?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] How to pre-build the dataset's index ?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] bf16 Parameters and fp32 Gradients
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION] Why megatron-core seems slower and use more gpu mem than legacy for gpt_pretrain?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 Incorrect shuffling of documents across epochs in GPTDataset
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION]why f and g must conjucates each other?
staleNo activity in 60 days on issue or PR