Number of workers not matching number of nodes #26

tornadoslims · 2020-09-20T07:04:38Z

+1 on building such an awesome product guys. Here's an issue I've ran into a couple times -

If you hit an OOM or do something else that corrupts state you can lose workers that won't come back with a

bc.dask_client.restart() or client.restart()

This isn't a huge issue bc it can be quickly fixed by stopping and starting the cluster, and if a 32 node cluster drops to 25 workers everything still works.

More of an issue - I just stopped and started a 128 node cluster and it came up with only 1 worker. restarting dask client from within py didn't help. Trying to reproduce. I took some screenshots and kept the logs - will send them over.

JB

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number of workers not matching number of nodes #26

Number of workers not matching number of nodes #26

tornadoslims commented Sep 20, 2020

Number of workers not matching number of nodes #26

Number of workers not matching number of nodes #26

Comments

tornadoslims commented Sep 20, 2020