Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCP4 Compute 0 stuck with 0% CPU #145

Closed
Javatar81 opened this issue Oct 27, 2023 · 10 comments
Closed

OCP4 Compute 0 stuck with 0% CPU #145

Javatar81 opened this issue Oct 27, 2023 · 10 comments
Labels
cluster/ocp4 Releated with our ocp4 cluster at Stormshift

Comments

@Javatar81
Copy link

Processes cannot be started, waiting forever. Shutdown and reboot does not help. VM runs on storm3.

@Javatar81
Copy link
Author

[core@compute-0 ~]$ sudo systemctl status kubelet
○ kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─01-kubens.conf, 10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf
Active: inactive (dead)

journalctl -b -u kubelet.service -u crio.service
-- No entries --

sudo systemctl start kubelet
--- WAITING FOREVER ----

@Javatar81
Copy link
Author

Oct 27 09:09:25 compute-0.ocp4.stormshift.coe.muc.redhat.com bash[21212]: Error: readlink /var/lib/containers/storage/overlay: invalid argument
Oct 27 09:09:25 compute-0.ocp4.stormshift.coe.muc.redhat.com systemd[1]: var-lib-containers-storage-overlay.mount: Deactivated successfully.

@rbo
Copy link
Member

rbo commented Nov 14, 2023

Just deleted the notready node

$ oc delete node compute-0.ocp4.stormshift.coe.muc.redhat.com
node "compute-0.ocp4.stormshift.coe.muc.redhat.com" deleted

To get MCO & ready again

@rbo
Copy link
Member

rbo commented Nov 14, 2023

Hard reboot of node compute-0.ocp4.stormshift.coe.muc.redhat.com

@Javatar81
Copy link
Author

Scheduling is disabled for compute-0. Can I schedule or are you still working on this?

@rbo rbo added the cluster/ocp4 Releated with our ocp4 cluster at Stormshift label Nov 15, 2023
@rbo
Copy link
Member

rbo commented Nov 15, 2023

I'm working on it

Copy link

Heads up @cluster/ocp4-admin - the "cluster/ocp4" label was applied to this issue.

@rbo
Copy link
Member

rbo commented Nov 15, 2023

Compute-0 joining again, csr apprioved.

compute-0.ocp4.stormshift.coe.muc.redhat.com   NotReady                   worker          1s       v1.26.9+c7606e7

@rbo
Copy link
Member

rbo commented Nov 15, 2023

Solution was to cleanup all images: sudo podman rmi --all

@rbo
Copy link
Member

rbo commented Nov 17, 2023

Looks like cluster is working again.

@rbo rbo closed this as completed Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster/ocp4 Releated with our ocp4 cluster at Stormshift
Projects
None yet
Development

No branches or pull requests

2 participants