You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We found that the plugin show that the num of devices is 0. Then I started to position why it is 0.
I checked the log of the plugin, and found that there always exposing 1k devices( non-zero), which shows the plugin works normally.
I checked the source code of the plugin, and found that the plugin would count the number of devices per period, the most import logic is that the plugin will compare the number of devices with the value of the previous cycle, and only update it when the value changes and report it to kubelet. There is a situation that an error may occur during kubelet communication, causing the kubectl client made an erroneous communication which causes the client to obtain a value of 0. However, since the plugin 'update' is only pushed when the actual value changes, the client value cannot be updated normally. Finally, actually the number of devices haven't been changed, but the client's value couldn't be updated to correct value.
// ListAndWatch lists devices and update that list according to the health status
func (rs *resourceServer) ListAndWatch(e *pluginapi.Empty, s pluginapi.DevicePlugin_ListAndWatchServer) error {
log.Printf("ListAndWatch called by kubelet for: %s", rs.resourceName)
resp := new(pluginapi.ListAndWatchResponse)
// Send initial list of devices
if err := rs.sendDevices(resp, s); err != nil {
return err
}
rs.mutex.RLock()
err := rs.updateCDISpec()
rs.mutex.RUnlock()
if err != nil {
log.Printf("cannot update CDI specs: %v", err)
return err
}
for {
select {
case <-s.Context().Done():
log.Printf("ListAndWatch stream close: %v", s.Context().Err())
return nil
case d := <-rs.health:
// FIXME: there is no way to recover from the Unhealthy state.
d.Health = pluginapi.Unhealthy
_ = s.Send(&pluginapi.ListAndWatchResponse{Devices: rs.devs})
case <-rs.updateResource:
if err := rs.sendDevices(resp, s); err != nil {
// The old stream may not be closed properly, return to close it
// and pass the update event to the new stream for processing
rs.updateResource <- true
return err
}
err := rs.updateCDISpec()
if err != nil {
log.Printf("cannot update CDI specs: %v", err)
return err
}
}
}
}
Therefore, I suggest whether we can change the update mechanism to add a forced push mechanism. For example, when the values are the same within a specified number of cycles, a forced push update will also be performed.
The text was updated successfully, but these errors were encountered:
We found that the plugin show that the num of devices is 0. Then I started to position why it is 0.
I checked the log of the plugin, and found that there always exposing 1k devices( non-zero), which shows the plugin works normally.
I checked the source code of the plugin, and found that the plugin would count the number of devices per period, the most import logic is that the plugin will compare the number of devices with the value of the previous cycle, and only update it when the value changes and report it to kubelet. There is a situation that an error may occur during kubelet communication, causing the kubectl client made an erroneous communication which causes the client to obtain a value of 0. However, since the plugin 'update' is only pushed when the actual value changes, the client value cannot be updated normally. Finally, actually the number of devices haven't been changed, but the client's value couldn't be updated to correct value.
Therefore, I suggest whether we can change the update mechanism to add a forced push mechanism. For example, when the values are the same within a specified number of cycles, a forced push update will also be performed.
The text was updated successfully, but these errors were encountered: