You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a small kubernetes cluster with 4 nodes. One some of the nodes, I have specialized devices attached. After configuring the devices in the config.json file, I found that it was returning Healthy, even for hosts that didn't have the hardware attached, nor having the device in /dev.
I took it a step further and picked a name off the top of my head: /dev/nonexistentdevice. Even this device that I would hope doesn't exist, was coming back as healthy.
2022/08/05 06:46:19 Starging K8s HostDevice Plugin.
2022/08/05 06:46:19 Starting FS watcher.
2022/08/05 06:46:19 Starting OS watcher.
2022/08/05 06:46:19 Reading /k8s-host-device-plugin/config.json
2022/08/05 06:46:19 loaded config: {"resourceName":"f00f.xyz/nonexistent","socketName":"nonexistent.sock","hostDevices":[{"hostPath":"/dev/nonexistentdevice","containerPath":"/dev/nonexistent","permission":"rwm"}],"numDevices":1,"healthCheckIntervalSeconds":5}
2022/08/05 06:46:19 expanded host devices:
2022/08/05 06:46:19 Starting to serve on /var/lib/kubelet/device-plugins/nonexistent.sock
2022/08/05 06:46:19 Starting health check every 5 seconds
2022/08/05 06:46:19 Registered device plugin with Kubelet
exposing devices: [&Device{ID:0,Health:Healthy,Topology:nil,}]
2022/08/05 06:46:24 Health is changed: -> Healthy
The text was updated successfully, but these errors were encountered:
Hi, I had the same problem. In my case, due to a typo in the device config file.
The problem is in Expand device function, which returns an empty array if it doesn't match the pattern used in the config file, exposing an empty device list that results as "Healthy". @everpeace, if you agree, I can open a pr with a fix to return an Error so the ds pod will not start reporting the error.
I have a small kubernetes cluster with 4 nodes. One some of the nodes, I have specialized devices attached. After configuring the devices in the
config.json
file, I found that it was returning Healthy, even for hosts that didn't have the hardware attached, nor having the device in/dev
.I took it a step further and picked a name off the top of my head:
/dev/nonexistentdevice
. Even this device that I would hope doesn't exist, was coming back as healthy.The text was updated successfully, but these errors were encountered: