You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default it does not warn as the setting hbase.taskmonitor.rpc.warn.time defaults to 0 which disables the check. But it's definitely useful to have a warning like this.
We saw tasks stuck for a week at a customer and this went unnoticed. As I'm not 100% sure on which tasks (i.e. also Procedures?) are monitored I suggest a rather high threshold.
Maybe 1 hour? (Even though most tasks should finish within seconds)
This is a warning that we then - later - should extract from the logs to convert into an alert and/or at least have a dashboard (in Grafana or Opensearch Dashboard) to show these.
The log message will start with: "Task may be stuck"
HBase has a TaskMonitor that can warn on any tasks that take too long.
https://github.com/apache/hbase/blob/47996d6c2128815e45bb8bdb6e3a470bfddd6106/hbase-server/src/main/java/org/apache/hadoop/hbase/monitoring/TaskMonitor.java#L49
By default it does not warn as the setting
hbase.taskmonitor.rpc.warn.time
defaults to 0 which disables the check. But it's definitely useful to have a warning like this.We saw tasks stuck for a week at a customer and this went unnoticed. As I'm not 100% sure on which tasks (i.e. also Procedures?) are monitored I suggest a rather high threshold.
Maybe 1 hour? (Even though most tasks should finish within seconds)
This is a warning that we then - later - should extract from the logs to convert into an alert and/or at least have a dashboard (in Grafana or Opensearch Dashboard) to show these.
The log message will start with: "Task may be stuck"
Also note that older versions have a bug that can wrong warnings: https://issues.apache.org/jira/browse/HBASE-22935
The text was updated successfully, but these errors were encountered: