How to support model with dynamic inference graph #9295
Unanswered
RunningLeon
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, thanks for your notice. I want to support model in llama.cpp that has a different compute graph between prefilling and decoding stages. I wonder if llama.cpp support dynamic inference graph(skip some layers in prefill stages). If so, how to do this ?
Beta Was this translation helpful? Give feedback.
All reactions