You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Analogous to how Huggingface handles loading models dynamically (just by typing the model name), I wondered how this could be achieved using BentoML.
I see that runners are now considered legacy. So, how could this be achieved by using class and methods decorators?
I am thinking of the following:
the service offers an endpoint where the model (and possible serve configurations) is defined, along with the text(s) to be embedded
the system recognizes if the model is already loaded on memory (GPU/CPU) or must be initialized first
the system outputs the embedding
Optional:
the system offers an endpoint where the list of loaded models is retrievable
the system offers a delete endpoint to remove/unload a model from memory
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Analogous to how Huggingface handles loading models dynamically (just by typing the model name), I wondered how this could be achieved using BentoML.
I see that runners are now considered legacy. So, how could this be achieved by using class and methods decorators?
I am thinking of the following:
Optional:
Beta Was this translation helpful? Give feedback.
All reactions