You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
first and foremost, awesome project you have created. :)
I have a question on the ModelMesh behaviour during Node Drains in k8s.
My Setup:
I am running ModelMesh as a sidecar of a modelmesh-serving deployment on EKS with Pod Disruption Budgets configured for the serving runtimes, so that only one replica (of 4 replicas) is shut down at a time during node drains.
I use model mesh to serve models on triton utilizing gpu, that take appr. 25 sec to load on a new server due to a warmup configuration
My Node drain Observation:
model mesh runs a prestop of Sigterm for the pod, which includes to wait, until all deployed models are loaded elsewhere
however, during that time (after prestop, before all models are loaded elsewhere), model mesh doesn't accept new inference requests for that models, resulting in a ModelNotHereException, when another modelmesh instance tries to fulfil the inference request
this results for the inference request to be delayed, until the models are loaded on another instance, which in my case can take up to 25 secs
I can see, that during time, when the model mesh instance, that is to be shut down, tries to load the models somewhere else, the models are still loaded on the respective triton
this should enable modelmesh, to handle inference request even after pre stop
Example request during node drain:
08:08:41.096: model-mesh-pre-stop (due to node drain)
08:08:57.272: another modelmesh instance received inference request and gets “ModelNotHereException” from the modelmesh instance in prestop
08:09:06.251: model of inference request loaded again elsewhere (old model mesh pod triggers complete shutdown of triton and itself)
08:09:06.417: inference response is sent
My Question:
is it a misconfiguration on my side, that model mesh does not accept inference requests after pre-stop or is this intented behaviour?
P.S.: Sorry if this Issue should be raised in the modelmesh-serving repo. If that is the case, i will reopen it there.
The text was updated successfully, but these errors were encountered:
Hey guys,
first and foremost, awesome project you have created. :)
I have a question on the ModelMesh behaviour during Node Drains in k8s.
My Setup:
My Node drain Observation:
Example request during node drain:
My Question:
P.S.: Sorry if this Issue should be raised in the modelmesh-serving repo. If that is the case, i will reopen it there.
The text was updated successfully, but these errors were encountered: