-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HH scores summed along batch dimension #14
Comments
@yeoedward @Ying1123 @Kyriection Hi,is there an answer for the above question? Besides,I also want to know when bathcing inference is used for llama, how to update the hh_socre? |
Hi, The HH scores should be sequence-independent. In this implementation, we use one sequence in each batch for testing. Will update the implementation for multi sequences shortly, by modifying (https://github.com/FMInference/H2O/blob/main/h2o_hf/utils_real_drop/modify_llama.py#L269) |
@Kyriection Thanks for your reply. I have changed the code to support batching inference, just as following. The recent_sze = 100, and hh_size = 24, works well for batch size = 1. However, when batch size is set to 2, the output is garbled(when the seq len is larger than 124(100+24)). Something wrong with the changed code?
|
Maybe I see it.
Just update the generation of k_hh_recent and v_hh_recent, the code works. |
The hh scores seem to be summed along the batch dimension, which is strange as they are sequence-dependent. Shouldn't separate hh scores be maintained for each sequence in a batch?
Code: https://github.com/FMInference/H2O/blob/main/h2o_hf/utils_real_drop/modify_llama.py#L132
Also, thanks for open sourcing your code!
The text was updated successfully, but these errors were encountered: