Bump 3rdparty/Megatron-LM from 2da43ef to 65720c8 (#579)

Bumps [3rdparty/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) from `2da43ef` to `65720c8`. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/65720c87ba9c9d0ae8c90b1ffdbdccd2d51b1bc1"><code>65720c8</code></a> Merge branch 'ko3n1g/chore/fix-local-generator-script' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/c8d12e647d47dabe6aed208198ff23c869df4599"><code>c8d12e6</code></a> ADLR/megatron-lm!2519 - chore: Fix local generator script</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/ab171c572a833d5c86f9e25ce83f58f5b9dbb7d9"><code>ab171c5</code></a> Merge branch 'ko3n1g/ci/use-torchrun' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/6e09dd4da0798d5e25eda1cc246eea9e0c54dee0"><code>6e09dd4</code></a> ADLR/megatron-lm!2507 - ci: Use torchrun</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/df28200e53056604cad1829ec55bb9ba7c555f23"><code>df28200</code></a> Merge branch 'generate_fix' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/342e35928bb2bbf6c49c03c62c286035b4d233d1"><code>342e359</code></a> ADLR/megatron-lm!2370 - Make generate function only return results for newly ...</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/15517f61ec279bcfe866807a9aa63556851ce142"><code>15517f6</code></a> Merge branch 'ko3n1g/ci/update-nightlies' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/c383fe923ef3307d21299fe32503da4e9051aede"><code>c383fe9</code></a> ADLR/megatron-lm!2511 - ci: Update golden values of nightlies</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/86e5481ca907e4906fd9e10f9b19dc217aac1e0e"><code>86e5481</code></a> Merge branch 'video_training' into 'main'</li> <li><a href="https://github.com/NVIDIA/Megatron-LM/commit/82a6dfd142cbb5c060a02669d721a78e3a3bba77"><code>82a6dfd</code></a> ADLR/megatron-lm!2500 - Video training</li> <li>Additional commits viewable in <a href="https://github.com/NVIDIA/Megatron-LM/compare/2da43ef4c1b9e76f03b7567360cf7390e877f1b6...65720c87ba9c9d0ae8c90b1ffdbdccd2d51b1bc1">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Peter St. John <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Peter St. John <[email protected]>
NVIDIA · Jan 9, 2025 · 11b6895 · 11b6895
1 parent 4ce05ba
commit 11b6895
Show file tree

Hide file tree

Showing 5 changed files with 2 additions and 492 deletions.
diff --git a/3rdparty/Megatron-LM b/3rdparty/Megatron-LM
diff --git a/sub-packages/bionemo-esm2/src/bionemo/esm2/model/attention.py b/sub-packages/bionemo-esm2/src/bionemo/esm2/model/attention.py
diff --git a/sub-packages/bionemo-esm2/src/bionemo/esm2/model/model.py b/sub-packages/bionemo-esm2/src/bionemo/esm2/model/model.py
@@ -35,7 +35,6 @@
 from torch.optim import Optimizer
 
 from bionemo.esm2.data.tokenizer import BioNeMoESMTokenizer
-from bionemo.esm2.model.attention import ESM2DotProductAttention, ESM2TEDotProductAttention
 from bionemo.esm2.model.embedding import ESM2Embedding
 from bionemo.llm.api import MegatronLossType
 from bionemo.llm.model.biobert.model import BioBertConfig, MegatronBioBertModel, PositionEmbeddingKinds
@@ -294,6 +293,7 @@ class ESM2GenericConfig(BioBertConfig[ESM2ModelT, MegatronLossType]):
     bias_activation_fusion: bool = True  # True degrades accuracy slightly, but is faster.
     activation_func: Callable = F.gelu  # esm_gelu_func  # ESM2 MLP
     init_method_std: float = 0.02
+    softmax_scale: float = 1.0
 
     # embedding
     token_dropout: bool = True
@@ -346,13 +346,11 @@ def __post_init__(self):
         super().__post_init__()
         if self.biobert_spec_option == BiobertSpecOption.esm2_bert_layer_with_transformer_engine_spec:
             self.apply_query_key_layer_scaling = False
-            self.core_attention_override = ESM2TEDotProductAttention
         elif self.biobert_spec_option == BiobertSpecOption.esm2_bert_layer_local_spec:
             logging.warning(
                 "BiobertSpecOption.esm2_bert_layer_local_spec is depreciated. Use BiobertSpecOption.esm2_bert_layer_with_transformer_engine_spec instead."
             )
             self.apply_query_key_layer_scaling = True
-            self.core_attention_override = ESM2DotProductAttention
         else:
             raise ValueError(f"Unknown biobert_spec_option: {self.biobert_spec_option}")