You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+1 on building such an awesome product guys. Here's an issue I've ran into a couple times -
If you hit an OOM or do something else that corrupts state you can lose workers that won't come back with a
bc.dask_client.restart() or client.restart()
This isn't a huge issue bc it can be quickly fixed by stopping and starting the cluster, and if a 32 node cluster drops to 25 workers everything still works.
More of an issue - I just stopped and started a 128 node cluster and it came up with only 1 worker. restarting dask client from within py didn't help. Trying to reproduce. I took some screenshots and kept the logs - will send them over.
JB
The text was updated successfully, but these errors were encountered:
+1 on building such an awesome product guys. Here's an issue I've ran into a couple times -
If you hit an OOM or do something else that corrupts state you can lose workers that won't come back with a
bc.dask_client.restart()
orclient.restart()
This isn't a huge issue bc it can be quickly fixed by stopping and starting the cluster, and if a 32 node cluster drops to 25 workers everything still works.
More of an issue - I just stopped and started a 128 node cluster and it came up with only 1 worker. restarting dask client from within py didn't help. Trying to reproduce. I took some screenshots and kept the logs - will send them over.
JB
The text was updated successfully, but these errors were encountered: