bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21

evellasques · 2024-03-22T16:38:44Z

In a more recent version of Transformers, inv_freq buffer is no longer persistent huggingface/transformers@95f96b4
This causes a crash when someone tries to load a (converted) Llama checkpoint in NeMo (ie.: CodeLlama 7b)

*Issue #, if available: N/A

Description of changes: Switches off persistent flag when registering inv_freq buffer

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

* In a more recent version of Transformers, inv_freq buffer is no longer persistent huggingface/transformers@95f96b4 * This causes a crash when someone tries to load a (converted) Llama checkpoint in NeMo (ie.: CodeLlama 7b)

evellasques · 2024-04-04T18:40:49Z

Hi,

Does anyone have an update on this? Basically, loading any of the recent HF checkpoints that rely on rotary position embeddings (ie.: Mistral, Llama), will result in a crash:

 Missing key(s) in state_dict: "model.language_model.encoder.layers.0.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.1.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.2.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.3.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.4.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.5.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.6.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.7.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.8.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.9.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.10.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.11.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.12.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.13.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.14.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.15.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.16.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.17.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.18.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.19.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.20.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.21.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.22.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.23.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.24.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.25.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.26.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.27.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.28.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.29.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.30.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.31.self_attention.core_attention.rotary_emb.inv_freq".

Because HF is not serializing inv_freq.

aws-singhada · 2024-04-04T18:43:24Z

Hi @mamidala, ***@***.***> , @king, ***@***.***>, Can you please take a look?

…

-Adarsh From: Eduardo Vellasques ***@***.***> Reply-To: aws-neuron/neuronx-nemo-megatron ***@***.***> Date: Thursday, April 4, 2024 at 11:41 AM To: aws-neuron/neuronx-nemo-megatron ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [aws-neuron/neuronx-nemo-megatron] bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent (PR #21) Hi, Does anyone have an update on this? Basically, loading any of the recent HF checkpoints that rely on rotary position embeddings (ie.: Mistral, Llama), will result in a crash: Missing key(s) in state_dict: "model.language_model.encoder.layers.0.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.1.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.2.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.3.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.4.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.5.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.6.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.7.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.8.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.9.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.10.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.11.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.12.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.13.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.14.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.15.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.16.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.17.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.18.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.19.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.20.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.21.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.22.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.23.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.24.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.25.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.26.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.27.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.28.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.29.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.30.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.31.self_attention.core_attention.rotary_emb.inv_freq". Because HF is not serializing inv_freq. — Reply to this email directly, view it on GitHub<#21 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AWRRYCJKTPS6WHOQUA247GTY3WNERAVCNFSM6AAAAABFDUQZXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZXHEZTKOBQGM>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

aws-kingrj · 2024-04-04T18:48:18Z

Can you resolve the merge conflicts? Then we can merge it

evellasques · 2024-04-04T21:53:20Z

Can you resolve the merge conflicts? Then we can merge it

Quick question, solving the conflict will involve replacing nemo/nemo/collections/nlp/modules/common/megatron/llama_module.py with nemo/nemo/collections/nlp/modules/common/megatron/falcon_module.py. I opened an issue about a bug in the Llama conversion scripts (basically they should aggregate gate_proj and up_proj weights from HuggingFace Llama into a single dense_h_to_4h (for Swiglu). Do you want me to also fix that and add it as part of this PR?

evellasques · 2024-04-17T14:50:05Z

Can you resolve the merge conflicts? Then we can merge it

I noticed that in a more recent release you guys handle that during checkpoint loading. I created another PR for the issue with checkpoint conversion (#26) so I'll close this one.

evellasques · 2024-04-17T14:50:58Z

Main issue was solved by another PR.

evellasques requested review from aws-maens and musunita as code owners March 22, 2024 16:38

evellasques closed this Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21

bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21

evellasques commented Mar 22, 2024

evellasques commented Apr 4, 2024

aws-singhada commented Apr 4, 2024 via email

aws-kingrj commented Apr 4, 2024

evellasques commented Apr 4, 2024 •

edited

Loading

evellasques commented Apr 17, 2024

evellasques commented Apr 17, 2024

bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21

bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21

Conversation

evellasques commented Mar 22, 2024

evellasques commented Apr 4, 2024

aws-singhada commented Apr 4, 2024 via email

aws-kingrj commented Apr 4, 2024

evellasques commented Apr 4, 2024 • edited Loading

evellasques commented Apr 17, 2024

evellasques commented Apr 17, 2024

evellasques commented Apr 4, 2024 •

edited

Loading