Is it possible to separate predict method for base model and PEFT-trained add-on? #628
truythu169
started this conversation in
General
Replies: 1 comment
-
IIUC, you would like to make one expensive inference call to the full BERT model and then add multiple, cheap LoRA modifications on top of the BERT prediction. If that understanding is correct, then it wouldn't work. The issue is that LoRA will modify the intermediate outputs of the BERT model along the way, so after the first transformer block, the output is already modified. Therefore, all subsequent steps need to be re-computed. It's not like LoRA outputs can just be added on top of the base BERT model output. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I know it depends on PEFT training method but with method like LORA, we can separate the inference for base model and new trained add-on, then create the final prediction for the output of them.
Here is the scenario why I need to separate this two inference:
I wonder is it possible to do this in the current version? or any-method to customize or contribute this function to the repository?
Beta Was this translation helpful? Give feedback.
All reactions