`FlashLlamaForCausalLM`'s using name `dense` for its mlp submodule causes error when using LoRA adapter #2715

sadra-barikbin · 2024-11-02T12:09:51Z

Hi there! 🤗

FlashLlamaForCausalLM uses name dense for its MLP submodule and when user wants to employ a LoRA adapter, get_mlp_weights skips this submodule.

Line 440 in 6e32205

self.dense = LlamaMLP(

Lines 259 to 261 in 6e32205

    
           def get_mlp_weights(i, layer): 
        
               weights = {} 
        
               if hasattr(layer, "mlp"):

This causes error:

[rank0]: KeyError: (0, 'gate_proj')

This is not the case for FlashGemma2ForCausalLM, for example, and it works properly. When I renamed dense to mlp , llama worked as well.

The text was updated successfully, but these errors were encountered:

koutarou-n · 2024-11-05T05:12:49Z

I faced the same error.
It succesfully works on release 2.3.0, but fails on 2.3.1, 2.4.0.

Provide feedback