Is it possible to separate predict method for base model and PEFT-trained add-on? #628

truythu169 · 2023-06-26T00:27:33Z

truythu169
Jun 26, 2023

I know it depends on PEFT training method but with method like LORA, we can separate the inference for base model and new trained add-on, then create the final prediction for the output of them.

Here is the scenario why I need to separate this two inference:

We have multiple down-stream tasks which all of them using different fine-tuned BERT models.
Instead of deploying multiple PEFT-trained BERT models to multiple end-points, we want to make prediction for base BERT model in one big machine, and deploying only PEFT-trained add-on to multiple smaller machines for prediction of each down-stream tasks. This will save us a huge cost of inference phase.

I wonder is it possible to do this in the current version? or any-method to customize or contribute this function to the repository?

BenjaminBossan · 2023-07-26T09:49:53Z

BenjaminBossan
Jul 26, 2023
Maintainer

IIUC, you would like to make one expensive inference call to the full BERT model and then add multiple, cheap LoRA modifications on top of the BERT prediction.

If that understanding is correct, then it wouldn't work. The issue is that LoRA will modify the intermediate outputs of the BERT model along the way, so after the first transformer block, the output is already modified. Therefore, all subsequent steps need to be re-computed. It's not like LoRA outputs can just be added on top of the base BERT model output.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to separate predict method for base model and PEFT-trained add-on? #628

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Is it possible to separate predict method for base model and PEFT-trained add-on? #628

truythu169 Jun 26, 2023

Replies: 1 comment

BenjaminBossan Jul 26, 2023 Maintainer

truythu169
Jun 26, 2023

BenjaminBossan
Jul 26, 2023
Maintainer