Pods do not terminate gracefully and do not scale #4

KaiWalter · 2023-12-31T15:57:13Z

Given a running configuration

NAME                                 READY   STATUS    RESTARTS   AGE
distributor-7f7769c8f6-7wxhx         1/1     Running   0          24m
distributor-dapr-w968l               1/1     Running   0          24m
receiver-express-69c989476f-5w854    1/1     Running   0          9m2s
receiver-express-dapr-vlbp9          1/1     Running   0          9m
receiver-standard-69d6797586-rspx8   1/1     Running   0          9m2s
receiver-standard-dapr-vrv5v         1/1     Running   0          8m59s

after changing the workload and redeploying, previous pods stay in Terminating

NAME                                 READY   STATUS        RESTARTS   AGE
distributor-7f7769c8f6-7wxhx         1/1     Running       0          29m
distributor-dapr-w968l               1/1     Running       0          29m
receiver-express-69c989476f-5w854    1/1     Terminating   0          14m
receiver-express-dapr-vlbp9          1/1     Running       0          14m
receiver-express-fb667dd79-tnn9g     1/1     Running       0          3m22s
receiver-standard-69d6797586-rspx8   1/1     Terminating   0          14m
receiver-standard-c8b8577b4-49c8j    1/1     Running       0          3m22s
receiver-standard-dapr-vrv5v         1/1     Running       0          14m

The text was updated successfully, but these errors were encountered:

KaiWalter · 2024-01-02T18:42:17Z

related issue : containerd/runwasi#418

KaiWalter · 2024-01-04T08:14:34Z

additionally, when I try to scale

$ kubectl scale deployment distributor --replicas 3

pods get stuck in ContainerCreating

NAME                                READY   STATUS              RESTARTS   AGE
distributor-6765d649b5-77444        0/1     ContainerCreating   0          3m15s
distributor-6765d649b5-g6k8w        1/1     Running             0          115m
distributor-6765d649b5-m4mlr        0/1     ContainerCreating   0          3m15s
distributor-dapr-x4spd              1/1     Running             0          115m

events e.g. for first pod:

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  81s   default-scheduler  Successfully assigned default/distributor-6765d649b5-77444 to aks-npspin-84373721-vmss000000

devigned · 2024-01-04T15:49:49Z

What version of the Spin shim are you using? Could you gather containerd logs related to the pods that are stuck in container creating?

Mossaka · 2024-01-05T02:08:17Z

(side note: you can refer to this issue for accessing containerd logs in your cluster deislabs/containerd-wasm-shims#197)

KaiWalter · 2024-01-05T15:50:13Z

What version of the Spin shim are you using? Could you gather containerd logs related to the pods that are stuck in container creating?

I am currently using kwasm-node-installer main branch and that copies Spin : /deislabs/containerd-wasm-shims/releases/download/v0.10.0/containerd-wasm-shims-v2-spin-linux-$(uname -m).tar.gz

I think I found the kubelet / containerd logs on the AKS node. I will redeploy with single pods to get rid of noice in the logs to make analysis easier.

KaiWalter · 2024-01-05T16:29:50Z

@devigned @Mossaka I hope this journalctl -u kubelet of the WASM-shimed AKS node helps: aks-kubelet.txt

Mossaka · 2024-01-05T21:29:51Z

It would be great to see containerd logs

devigned · 2024-01-05T22:32:11Z

It would be great to see containerd logs

All of the details about the shim starting and possibly failing will be contained at the containerd level. The Kubelet is a little too high level to provide the detail required to diagnose the issue.

KaiWalter · 2024-01-07T14:04:54Z

It would be great to see containerd logs

All of the details about the shim starting and possibly failing will be contained at the containerd level. The Kubelet is a little too high level to provide the detail required to diagnose the issue.

Do you have any pointers for me where I could find containerd logs on an AKS node?

Mossaka · 2024-01-08T18:01:13Z

Can you try journalctl -u containerd?

KaiWalter · 2024-01-09T07:48:30Z

thanks @Mossaka - that worked ... but + @devigned - since I switched to kwasm-node-installer this behavior seems to be gone

$ k scale deployment --replicas 8 distributor
deployment.apps/distributor scaled
$ k get pod
NAME                                READY   STATUS        RESTARTS      AGE
distributor-54576c4fd5-9srhk        1/1     Running       0             14m
distributor-54576c4fd5-crpfn        1/1     Running       0             14m
distributor-54576c4fd5-fx86x        1/1     Running       0             19m
distributor-54576c4fd5-hjcgk        1/1     Running       0             14m
distributor-54576c4fd5-hn52r        1/1     Running       0             19m
distributor-54576c4fd5-jz42v        1/1     Running       0             14m
distributor-54576c4fd5-ldqs9        1/1     Terminating   0             14m
distributor-54576c4fd5-vxddt        1/1     Running       0             19m
distributor-54576c4fd5-zbwh5        1/1     Running       0             14m
distributor-dapr-5cd8c7cb9b-9w8mc   1/1     Running       0             19m
distributor-dapr-5cd8c7cb9b-rwkk6   1/1     Running       0             14m
distributor-dapr-5cd8c7cb9b-xgdgf   1/1     Running       1 (19m ago)   19m
kwasm-debug-bdq4l                   1/1     Running       0             5m54s
kwasm-initializer-9sr5q             1/1     Running       0             22m
... after a few seconds ...
$ k get pod
NAME                                READY   STATUS    RESTARTS      AGE
distributor-54576c4fd5-9srhk        1/1     Running   0             15m
distributor-54576c4fd5-crpfn        1/1     Running   0             15m
distributor-54576c4fd5-fx86x        1/1     Running   0             20m
distributor-54576c4fd5-hjcgk        1/1     Running   0             15m
distributor-54576c4fd5-hn52r        1/1     Running   0             20m
distributor-54576c4fd5-jz42v        1/1     Running   0             15m
distributor-54576c4fd5-vxddt        1/1     Running   0             20m
distributor-54576c4fd5-zbwh5        1/1     Running   0             15m
distributor-dapr-5cd8c7cb9b-9w8mc   1/1     Running   0             20m
distributor-dapr-5cd8c7cb9b-rwkk6   1/1     Running   0             15m
distributor-dapr-5cd8c7cb9b-xgdgf   1/1     Running   1 (20m ago)   20m
kwasm-debug-bdq4l                   1/1     Running   0             7m8s
kwasm-initializer-9sr5q             1/1     Running   0             23m

KaiWalter changed the title ~~Pods do not terminate gracefully~~ Pods do not terminate gracefully and do not scale Jan 4, 2024

KaiWalter closed this as completed Jan 9, 2024

devigned mentioned this issue Jan 9, 2024

WASM Pod terminating stack in Kubernetes containerd/runwasi#418

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods do not terminate gracefully and do not scale #4

Pods do not terminate gracefully and do not scale #4

KaiWalter commented Dec 31, 2023

KaiWalter commented Jan 2, 2024

KaiWalter commented Jan 4, 2024

devigned commented Jan 4, 2024

Mossaka commented Jan 5, 2024

KaiWalter commented Jan 5, 2024

KaiWalter commented Jan 5, 2024

Mossaka commented Jan 5, 2024

devigned commented Jan 5, 2024

KaiWalter commented Jan 7, 2024

Mossaka commented Jan 8, 2024

KaiWalter commented Jan 9, 2024

Pods do not terminate gracefully and do not scale #4

Pods do not terminate gracefully and do not scale #4

Comments

KaiWalter commented Dec 31, 2023

KaiWalter commented Jan 2, 2024

KaiWalter commented Jan 4, 2024

devigned commented Jan 4, 2024

Mossaka commented Jan 5, 2024

KaiWalter commented Jan 5, 2024

KaiWalter commented Jan 5, 2024

Mossaka commented Jan 5, 2024

devigned commented Jan 5, 2024

KaiWalter commented Jan 7, 2024

Mossaka commented Jan 8, 2024

KaiWalter commented Jan 9, 2024