Are there any plans to implement concurrent LoRa inference with multiple adapters (such as S-lora)? #1237
Unanswered
SamGalanakis
asked this question in
Q&A
Replies: 2 comments 1 reply
-
Do you mean as suggested in #903? If yes, there are plans, hopefully we can tackle it soon. But note that SLoRA has a bunch of specialized optimizations that we cannot do in PEFT, as we want to support a very broad range of models and adapter types. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Ah my bad I missed that one. So that will allow parallel inference, any rough idea how pefromant it will be throughput and memory wise? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Would be very useful and there doesn't seem to be a flexible implementation of this yet.
Beta Was this translation helpful? Give feedback.
All reactions