-
Notifications
You must be signed in to change notification settings - Fork 788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any plan on upadating the code for LLaMA models? #128
Comments
Doesn't it work as decoder model? |
@Bachstelze , this is good news. |
The code is similar to the GPT example in this repo:
The loading and processing already take 30 GB of RAM. My machine starts to swap at this point and i just save the html to visualize it after the RAM is free again. The output looks very repetitive. In the case of Vicuna (lmsys/vicuna-7b-delta-v1.1 From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning) all heads consist of the same weight shape. Every token can only attend to its previous tokens due to the one-directional objective of GPTS, e.g. the first token can only attend to the start of the sentence token and itself. Interestingly, every token equally shares its weights to all possible tokens. Therefore, the attention weights are strongest for the tokens at the beginning and decrease towards the end. This entails a "L" shape like the positive multiplicative inverse. Let me know if you find other patterns or have a good explanation for this phenomenon |
@Bachstelze Thank you so much <3 |
Hi! I am trying to use also bertviz with LLMs. But have you manage to see not only self-attentions of the first iteration but the attention of the genearted word too? Using model.generate method. |
Have you solve the problem? Thank you! |
Hi, I finally ended up using captum and it works perfectly! |
Thank you for the information! I find this works as well: https://github.com/mattneary/attention. I will try using captum, thank you! |
@Icamd Does https://github.com/mattneary/attention work well with bigger GPTs? Do you know how the attention weights are aggregated into one view? @MarioRicoIbanez Can we use captum to view the attention pattern? |
The Llama 3 model sinks most of the time all attention to the "begin of the text" token. |
@Bachstelze , can you please clarify on the part where you said :
|
Thank you for the great repo.
Is there any plan from your side to update the code for LLaMA model? or is there anything I can do to update the codes to visualize LLaMA model?
The text was updated successfully, but these errors were encountered: