-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21
bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21
Conversation
* In a more recent version of Transformers, inv_freq buffer is no longer persistent huggingface/transformers@95f96b4 * This causes a crash when someone tries to load a (converted) Llama checkpoint in NeMo (ie.: CodeLlama 7b)
Hi, Does anyone have an update on this? Basically, loading any of the recent HF checkpoints that rely on rotary position embeddings (ie.: Mistral, Llama), will result in a crash:
Because HF is not serializing |
… -Adarsh
From: Eduardo Vellasques ***@***.***>
Reply-To: aws-neuron/neuronx-nemo-megatron ***@***.***>
Date: Thursday, April 4, 2024 at 11:41 AM
To: aws-neuron/neuronx-nemo-megatron ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [aws-neuron/neuronx-nemo-megatron] bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent (PR #21)
Hi,
Does anyone have an update on this? Basically, loading any of the recent HF checkpoints that rely on rotary position embeddings (ie.: Mistral, Llama), will result in a crash:
Missing key(s) in state_dict: "model.language_model.encoder.layers.0.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.1.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.2.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.3.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.4.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.5.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.6.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.7.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.8.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.9.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.10.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.11.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.12.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.13.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.14.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.15.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.16.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.17.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.18.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.19.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.20.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.21.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.22.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.23.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.24.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.25.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.26.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.27.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.28.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.29.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.30.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.31.self_attention.core_attention.rotary_emb.inv_freq".
Because HF is not serializing inv_freq.
—
Reply to this email directly, view it on GitHub<#21 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AWRRYCJKTPS6WHOQUA247GTY3WNERAVCNFSM6AAAAABFDUQZXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZXHEZTKOBQGM>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Can you resolve the merge conflicts? Then we can merge it |
Quick question, solving the conflict will involve replacing nemo/nemo/collections/nlp/modules/common/megatron/llama_module.py with nemo/nemo/collections/nlp/modules/common/megatron/falcon_module.py. I opened an issue about a bug in the Llama conversion scripts (basically they should aggregate |
I noticed that in a more recent release you guys handle that during checkpoint loading. I created another PR for the issue with checkpoint conversion (#26) so I'll close this one. |
Main issue was solved by another PR. |
*Issue #, if available: N/A
Description of changes: Switches off persistent flag when registering inv_freq buffer
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.