How to build a service that dynamically loads embedding models? #4774

kobiche · 2024-06-03T22:33:01Z

kobiche
Jun 3, 2024

Analogous to how Huggingface handles loading models dynamically (just by typing the model name), I wondered how this could be achieved using BentoML.
I see that runners are now considered legacy. So, how could this be achieved by using class and methods decorators?
I am thinking of the following:

the service offers an endpoint where the model (and possible serve configurations) is defined, along with the text(s) to be embedded
the system recognizes if the model is already loaded on memory (GPU/CPU) or must be initialized first
the system outputs the embedding

Optional:

the system offers an endpoint where the list of loaded models is retrievable
the system offers a delete endpoint to remove/unload a model from memory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML

How to build a service that dynamically loads embedding models? #4774

{{title}}

Replies: 0 comments

Select a reply

BentoML

How to build a service that dynamically loads embedding models? #4774

kobiche Jun 3, 2024

Replies: 0 comments

kobiche
Jun 3, 2024