Why memory usage not change when add different input with GGML format #566
Unanswered
SiraHaruethaipree
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I don't know too much about GGML format. But I know memory usage in vram GPU was changed depending on input sequence if input is long sequence it will increase the memory usage like when I test with load_8_bit or load_4_bit method from huggingface. So I need to know how memory usage is always the same value when use GGML format with GPU. Please someone explain.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions