Detect "incompatible ExecutorInfo" situation and kill the old Executors #172

erikdw · 2016-09-25T04:46:22Z

Sometimes you may witness Storm thinking it has some Storm Worker processes, but the Worker process aren't actually visible as tasks in the Mesos UI. It will usually surface on the Storm UI's component-view as the Storm Executors having a hostname & port, but the uptime being an empty string.

One of the causes of this situation relates to the contents of the ExecutorInfo for the new task. If there was an existing Storm Supervisor (Mesos Executor) on the target host for this task, and if the new task has different values in its ExecutorInfo, then the new task will be rejected by Mesos with a TaskStatus update containing a TaskState of TASK_ERROR.

The message will look like:

s.m.MesosNimbus [INFO] Received status update: {"task_id":"worker-host.domain-31000-1474755616.828","slave_id":"20160427-042423-617289226-5050-9149-S3","state":"TASK_ERROR","message":"Task has invalid ExecutorInfo (existing ExecutorInfo with same ExecutorID is not compatible). ...

This can happen for various reasons, since Mesos considers any variance in the ExecutorInfo to be a problem:

changing the Executor resources in storm.yaml
- e.g., topology.mesos.executor.cpu or topology.mesos.executor.mem.
changing the URI used for downloading resources into the sandbox.
- e.g., the URL for the Nimbus's Jetty Server which is used on the worker hosts to download the storm.yaml config from the Nimbus.
- e.g., the URI from which the storm-mesos release tarball is downloaded.

So, with the current framework implementation, if we want to ever change those values, then we must kill all of the existing Supervisors and Executors under this framework instance before enabling the new config, otherwise we end up with confusing problems.

It would be nice if the framework could instead detect such a mismatch and automatically kill the existing Executor/Supervisor.

The text was updated successfully, but these errors were encountered:

erikdw added enhancement scheduler labels Sep 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect "incompatible ExecutorInfo" situation and kill the old Executors #172

Detect "incompatible ExecutorInfo" situation and kill the old Executors #172

erikdw commented Sep 25, 2016

Detect "incompatible ExecutorInfo" situation and kill the old Executors #172

Detect "incompatible ExecutorInfo" situation and kill the old Executors #172

Comments

erikdw commented Sep 25, 2016