Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21

Closed

Conversation

evellasques
Copy link
Contributor

  • In a more recent version of Transformers, inv_freq buffer is no longer persistent huggingface/transformers@95f96b4
  • This causes a crash when someone tries to load a (converted) Llama checkpoint in NeMo (ie.: CodeLlama 7b)

*Issue #, if available: N/A

Description of changes: Switches off persistent flag when registering inv_freq buffer

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

* In a more recent version of Transformers, inv_freq buffer is no longer persistent huggingface/transformers@95f96b4
* This causes a crash when someone tries to load a (converted) Llama checkpoint in NeMo (ie.: CodeLlama 7b)
@evellasques
Copy link
Contributor Author

Hi,

Does anyone have an update on this? Basically, loading any of the recent HF checkpoints that rely on rotary position embeddings (ie.: Mistral, Llama), will result in a crash:

 Missing key(s) in state_dict: "model.language_model.encoder.layers.0.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.1.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.2.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.3.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.4.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.5.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.6.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.7.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.8.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.9.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.10.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.11.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.12.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.13.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.14.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.15.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.16.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.17.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.18.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.19.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.20.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.21.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.22.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.23.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.24.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.25.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.26.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.27.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.28.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.29.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.30.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.31.self_attention.core_attention.rotary_emb.inv_freq". 

Because HF is not serializing inv_freq.

@aws-singhada
Copy link

aws-singhada commented Apr 4, 2024 via email

@aws-kingrj
Copy link
Contributor

Can you resolve the merge conflicts? Then we can merge it

@evellasques
Copy link
Contributor Author

evellasques commented Apr 4, 2024

Can you resolve the merge conflicts? Then we can merge it

Quick question, solving the conflict will involve replacing nemo/nemo/collections/nlp/modules/common/megatron/llama_module.py with nemo/nemo/collections/nlp/modules/common/megatron/falcon_module.py. I opened an issue about a bug in the Llama conversion scripts (basically they should aggregate gate_proj and up_proj weights from HuggingFace Llama into a single dense_h_to_4h (for Swiglu). Do you want me to also fix that and add it as part of this PR?

@evellasques
Copy link
Contributor Author

Can you resolve the merge conflicts? Then we can merge it

I noticed that in a more recent release you guys handle that during checkpoint loading. I created another PR for the issue with checkpoint conversion (#26) so I'll close this one.

@evellasques
Copy link
Contributor Author

Main issue was solved by another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants