Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModelMesh behaviour during Node Drain (i.e. during Cluster Upgrade/Node Update) #142

Open
SDJustus opened this issue Jun 21, 2024 · 0 comments

Comments

@SDJustus
Copy link

Hey guys,

first and foremost, awesome project you have created. :)

I have a question on the ModelMesh behaviour during Node Drains in k8s.

My Setup:

  • I am running ModelMesh as a sidecar of a modelmesh-serving deployment on EKS with Pod Disruption Budgets configured for the serving runtimes, so that only one replica (of 4 replicas) is shut down at a time during node drains.
  • I use model mesh to serve models on triton utilizing gpu, that take appr. 25 sec to load on a new server due to a warmup configuration

My Node drain Observation:

  • model mesh runs a prestop of Sigterm for the pod, which includes to wait, until all deployed models are loaded elsewhere
  • however, during that time (after prestop, before all models are loaded elsewhere), model mesh doesn't accept new inference requests for that models, resulting in a ModelNotHereException, when another modelmesh instance tries to fulfil the inference request
  • this results for the inference request to be delayed, until the models are loaded on another instance, which in my case can take up to 25 secs
  • I can see, that during time, when the model mesh instance, that is to be shut down, tries to load the models somewhere else, the models are still loaded on the respective triton
  • this should enable modelmesh, to handle inference request even after pre stop

Example request during node drain:

  • 08:08:41.096: model-mesh-pre-stop (due to node drain)
  • 08:08:57.272: another modelmesh instance received inference request and gets “ModelNotHereException” from the modelmesh instance in prestop
  • 08:09:06.251: model of inference request loaded again elsewhere (old model mesh pod triggers complete shutdown of triton and itself)
  • 08:09:06.417: inference response is sent

My Question:

  • is it a misconfiguration on my side, that model mesh does not accept inference requests after pre-stop or is this intented behaviour?

P.S.: Sorry if this Issue should be raised in the modelmesh-serving repo. If that is the case, i will reopen it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant