Replies: 1 comment
-
I want to konw also. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
With pytorch and the transformer framework, I can get the kv cache like this
generated = model.generate(input_ids, max_new_tokens = 1, return_dict_in_generate=True)
kv = generated['past_key_values']
How to get the corresponding kv cache under llamacpp?
I really appreciate everyone being able to answer my questions, thanks a million!
Beta Was this translation helpful? Give feedback.
All reactions