-
I can imagine "packing" two models into my Service. The questions I have are:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
It's pretty great to see you guys are doing more advanced ML serving use cases. Right now, I think it is challenging to do shadow inference well with BentoService.
class MyService(BentoService):
def prediction_current(self, df):
return self.models.current_model.predict(df)
def prediction_shadow(self, df):
return self.models.shadow_model.predict(df)
@api(input=DataframeInput(), batch=True)
def predict(self, df):
current_prediction = self.prediction_current(df)
shadow_prediction = self.predict_shadow(df)
return [{result: current_prediction, shadow_result: shadow_prediction}] One of the challenges to do shadowing with separate BentoServices is how to join the request logs for comparing prediction results to evaluate models. Right now each BentoServices generate a Once we implement the feature that API service use |
Beta Was this translation helpful? Give feedback.
-
That's a great discussion point! Our team operates a API server in front of one service model and multiple shadow models when it is necessary to compare the performance of the models. |
Beta Was this translation helpful? Give feedback.
It's pretty great to see you guys are doing more advanced ML serving use cases.
Right now, I think it is challenging to do shadow inference well with BentoService.
I don't think BentoML has out of box capability to mark models run in a low-priority thread. @parano might have a good idea for this. I am going to defer to him
You can leverage BentoML's logging when you deploy them in the same services.