Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods do not terminate gracefully and do not scale #4

Closed
KaiWalter opened this issue Dec 31, 2023 · 11 comments
Closed

Pods do not terminate gracefully and do not scale #4

KaiWalter opened this issue Dec 31, 2023 · 11 comments

Comments

@KaiWalter
Copy link
Contributor

Given a running configuration

NAME                                 READY   STATUS    RESTARTS   AGE
distributor-7f7769c8f6-7wxhx         1/1     Running   0          24m
distributor-dapr-w968l               1/1     Running   0          24m
receiver-express-69c989476f-5w854    1/1     Running   0          9m2s
receiver-express-dapr-vlbp9          1/1     Running   0          9m
receiver-standard-69d6797586-rspx8   1/1     Running   0          9m2s
receiver-standard-dapr-vrv5v         1/1     Running   0          8m59s

after changing the workload and redeploying, previous pods stay in Terminating

NAME                                 READY   STATUS        RESTARTS   AGE
distributor-7f7769c8f6-7wxhx         1/1     Running       0          29m
distributor-dapr-w968l               1/1     Running       0          29m
receiver-express-69c989476f-5w854    1/1     Terminating   0          14m
receiver-express-dapr-vlbp9          1/1     Running       0          14m
receiver-express-fb667dd79-tnn9g     1/1     Running       0          3m22s
receiver-standard-69d6797586-rspx8   1/1     Terminating   0          14m
receiver-standard-c8b8577b4-49c8j    1/1     Running       0          3m22s
receiver-standard-dapr-vrv5v         1/1     Running       0          14m
@KaiWalter
Copy link
Contributor Author

related issue : containerd/runwasi#418

@KaiWalter KaiWalter changed the title Pods do not terminate gracefully Pods do not terminate gracefully and do not scale Jan 4, 2024
@KaiWalter
Copy link
Contributor Author

additionally, when I try to scale

$ kubectl scale deployment distributor --replicas 3

pods get stuck in ContainerCreating

NAME                                READY   STATUS              RESTARTS   AGE
distributor-6765d649b5-77444        0/1     ContainerCreating   0          3m15s
distributor-6765d649b5-g6k8w        1/1     Running             0          115m
distributor-6765d649b5-m4mlr        0/1     ContainerCreating   0          3m15s
distributor-dapr-x4spd              1/1     Running             0          115m

events e.g. for first pod:

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  81s   default-scheduler  Successfully assigned default/distributor-6765d649b5-77444 to aks-npspin-84373721-vmss000000

@devigned
Copy link

devigned commented Jan 4, 2024

What version of the Spin shim are you using? Could you gather containerd logs related to the pods that are stuck in container creating?

@Mossaka
Copy link

Mossaka commented Jan 5, 2024

(side note: you can refer to this issue for accessing containerd logs in your cluster deislabs/containerd-wasm-shims#197)

@KaiWalter
Copy link
Contributor Author

What version of the Spin shim are you using? Could you gather containerd logs related to the pods that are stuck in container creating?

I am currently using kwasm-node-installer main branch and that copies Spin : /deislabs/containerd-wasm-shims/releases/download/v0.10.0/containerd-wasm-shims-v2-spin-linux-$(uname -m).tar.gz

I think I found the kubelet / containerd logs on the AKS node. I will redeploy with single pods to get rid of noice in the logs to make analysis easier.

@KaiWalter
Copy link
Contributor Author

@devigned @Mossaka I hope this journalctl -u kubelet of the WASM-shimed AKS node helps: aks-kubelet.txt

@Mossaka
Copy link

Mossaka commented Jan 5, 2024

It would be great to see containerd logs

@devigned
Copy link

devigned commented Jan 5, 2024

It would be great to see containerd logs

All of the details about the shim starting and possibly failing will be contained at the containerd level. The Kubelet is a little too high level to provide the detail required to diagnose the issue.

@KaiWalter
Copy link
Contributor Author

It would be great to see containerd logs

All of the details about the shim starting and possibly failing will be contained at the containerd level. The Kubelet is a little too high level to provide the detail required to diagnose the issue.

Do you have any pointers for me where I could find containerd logs on an AKS node?

@Mossaka
Copy link

Mossaka commented Jan 8, 2024

Can you try journalctl -u containerd?

@KaiWalter
Copy link
Contributor Author

thanks @Mossaka - that worked ... but + @devigned - since I switched to kwasm-node-installer this behavior seems to be gone

$ k scale deployment --replicas 8 distributor
deployment.apps/distributor scaled
$ k get pod
NAME                                READY   STATUS        RESTARTS      AGE
distributor-54576c4fd5-9srhk        1/1     Running       0             14m
distributor-54576c4fd5-crpfn        1/1     Running       0             14m
distributor-54576c4fd5-fx86x        1/1     Running       0             19m
distributor-54576c4fd5-hjcgk        1/1     Running       0             14m
distributor-54576c4fd5-hn52r        1/1     Running       0             19m
distributor-54576c4fd5-jz42v        1/1     Running       0             14m
distributor-54576c4fd5-ldqs9        1/1     Terminating   0             14m
distributor-54576c4fd5-vxddt        1/1     Running       0             19m
distributor-54576c4fd5-zbwh5        1/1     Running       0             14m
distributor-dapr-5cd8c7cb9b-9w8mc   1/1     Running       0             19m
distributor-dapr-5cd8c7cb9b-rwkk6   1/1     Running       0             14m
distributor-dapr-5cd8c7cb9b-xgdgf   1/1     Running       1 (19m ago)   19m
kwasm-debug-bdq4l                   1/1     Running       0             5m54s
kwasm-initializer-9sr5q             1/1     Running       0             22m
... after a few seconds ...
$ k get pod
NAME                                READY   STATUS    RESTARTS      AGE
distributor-54576c4fd5-9srhk        1/1     Running   0             15m
distributor-54576c4fd5-crpfn        1/1     Running   0             15m
distributor-54576c4fd5-fx86x        1/1     Running   0             20m
distributor-54576c4fd5-hjcgk        1/1     Running   0             15m
distributor-54576c4fd5-hn52r        1/1     Running   0             20m
distributor-54576c4fd5-jz42v        1/1     Running   0             15m
distributor-54576c4fd5-vxddt        1/1     Running   0             20m
distributor-54576c4fd5-zbwh5        1/1     Running   0             15m
distributor-dapr-5cd8c7cb9b-9w8mc   1/1     Running   0             20m
distributor-dapr-5cd8c7cb9b-rwkk6   1/1     Running   0             15m
distributor-dapr-5cd8c7cb9b-xgdgf   1/1     Running   1 (20m ago)   20m
kwasm-debug-bdq4l                   1/1     Running   0             7m8s
kwasm-initializer-9sr5q             1/1     Running   0             23m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants