Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlashLlamaForCausalLM's using name dense for its mlp submodule causes error when using LoRA adapter #2715

Open
sadra-barikbin opened this issue Nov 2, 2024 · 1 comment

Comments

@sadra-barikbin
Copy link
Contributor

Hi there! 🤗

FlashLlamaForCausalLM uses name dense for its MLP submodule and when user wants to employ a LoRA adapter, get_mlp_weights skips this submodule.

def get_mlp_weights(i, layer):
weights = {}
if hasattr(layer, "mlp"):

This causes error:

[rank0]: KeyError: (0, 'gate_proj')

This is not the case for FlashGemma2ForCausalLM, for example, and it works properly. When I renamed dense to mlp , llama worked as well.

@koutarou-n
Copy link

koutarou-n commented Nov 5, 2024

I faced the same error.
It succesfully works on release 2.3.0, but fails on 2.3.1, 2.4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants