Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HealthCheck always reports Healthy #15

Open
jrstarke opened this issue Aug 5, 2022 · 2 comments · May be fixed by #19
Open

HealthCheck always reports Healthy #15

jrstarke opened this issue Aug 5, 2022 · 2 comments · May be fixed by #19

Comments

@jrstarke
Copy link

jrstarke commented Aug 5, 2022

I have a small kubernetes cluster with 4 nodes. One some of the nodes, I have specialized devices attached. After configuring the devices in the config.json file, I found that it was returning Healthy, even for hosts that didn't have the hardware attached, nor having the device in /dev.

I took it a step further and picked a name off the top of my head: /dev/nonexistentdevice. Even this device that I would hope doesn't exist, was coming back as healthy.

2022/08/05 06:46:19 Starging K8s HostDevice Plugin.
2022/08/05 06:46:19 Starting FS watcher.
2022/08/05 06:46:19 Starting OS watcher.
2022/08/05 06:46:19 Reading /k8s-host-device-plugin/config.json
2022/08/05 06:46:19 loaded config:  {"resourceName":"f00f.xyz/nonexistent","socketName":"nonexistent.sock","hostDevices":[{"hostPath":"/dev/nonexistentdevice","containerPath":"/dev/nonexistent","permission":"rwm"}],"numDevices":1,"healthCheckIntervalSeconds":5}
2022/08/05 06:46:19 expanded host devices:
2022/08/05 06:46:19 Starting to serve on /var/lib/kubelet/device-plugins/nonexistent.sock
2022/08/05 06:46:19 Starting health check every 5 seconds
2022/08/05 06:46:19 Registered device plugin with Kubelet
exposing devices:  [&Device{ID:0,Health:Healthy,Topology:nil,}]
2022/08/05 06:46:24 Health is changed:  -> Healthy
@mgeri
Copy link

mgeri commented Jan 22, 2023

Hi, I had the same problem. In my case, due to a typo in the device config file.
The problem is in Expand device function, which returns an empty array if it doesn't match the pattern used in the config file, exposing an empty device list that results as "Healthy".
@everpeace, if you agree, I can open a pr with a fix to return an Error so the ds pod will not start reporting the error.

@everpeace
Copy link
Owner

if you agree, I can open a pr with a fix to return an Error so the ds pod will not start reporting the error.

@mgeri Thanks. I'm looking forward to your PR.

@mgeri mgeri linked a pull request Feb 2, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants