You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the Cellpose segmentation using the dask backend, cell crashes after a while.
Multiple workers showed the error exceeded 95% memory budget. Restarting...". Then after a while it says that a task will be marked as failed because 4 workers died while trying to run it`.
Then it completely crashes with these errors :
2024-09-25 16:02:04,452 - distributed.scheduler - WARNING - Removing worker 'tcp://127.0.0.1:58018' caused the cluster to lose already computed task(s), which will be recomputed elsewhere: {('array-0173291e659995cef21d3d1e6515a34d', 0)} (stimulus_id='handle-worker-cleanup-1727272924.4456077')
2024-09-25 16:02:04,458 - distributed.scheduler - WARNING - Removing worker 'tcp://127.0.0.1:57843' caused the cluster to lose already computed task(s), which will be recomputed elsewhere: {'shuffle-taker-1981a4f154033ba88983f1452daf58f3', ('block-info-_map_read_frame-b518e369790450b6bf2ef0f396523719', 0, 0, 0)} (stimulus_id='handle-worker-cleanup-1727272924.4513524')
2024-09-25 16:02:07,609 - distributed.nanny - WARNING - Worker process still alive after 4.0 seconds, killing
2024-09-25 16:02:07,610 - distributed.nanny - WARNING - Worker process still alive after 4.0 seconds, killing
2024-09-25 16:02:07,612 - distributed.nanny - WARNING - Worker process still alive after 4.0 seconds, killing
2024-09-25 16:02:07,614 - distributed.nanny - WARNING - Worker process still alive after 4.0 seconds, killing
2024-09-25 16:02:07,615 - distributed.nanny - WARNING - Worker process still alive after 4.0 seconds, killing
2024-09-25 16:02:08,612 - distributed.client - ERROR -
Traceback (most recent call last):
File "S:\anaconda_envs\sopa\lib\asyncio\tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\utils.py", line 806, in wrapper
return await func(*args, **kwargs)
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\client.py", line 1938, in _close
await self.cluster.close()
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\deploy\spec.py", line 448, in _close
await self._correct_state()
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\deploy\spec.py", line 359, in _correct_state_internal
await asyncio.gather(*tasks)
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\nanny.py", line 619, in close
await self.kill(timeout=timeout, reason=reason)
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\nanny.py", line 400, in kill
await self.process.kill(reason=reason, timeout=timeout)
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\nanny.py", line 882, in kill
await process.join(max(0, deadline - time()))
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\process.py", line 330, in join
await wait_for(asyncio.shield(self._exit_future), timeout)
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\utils.py", line 1926, in wait_for
return await asyncio.wait_for(fut, timeout)
File "S:\anaconda_envs\sopa\lib\asyncio\tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2024-09-25 16:02:08,614 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x000001478C780190>>, <Task finished name='Task-63060' coro=<SpecCluster._correct_state_internal() done, defined at S:\anaconda_envs\sopa\lib\site-packages\distributed\deploy\spec.py:346> exception=TimeoutError()>)
Traceback (most recent call last):
File "S:\anaconda_envs\sopa\lib\asyncio\tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "S:\anaconda_envs\sopa\lib\site-packages\tornado\ioloop.py", line 750, in _run_callback
ret = callback()
File "S:\anaconda_envs\sopa\lib\site-packages\tornado\ioloop.py", line 774, in _discard_future_result
future.result()
asyncio.exceptions.TimeoutError
Future exception was never retrieved
future: <Future finished exception=PermissionError(13, 'Access is denied', None, 5, None)>
Traceback (most recent call last):
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\process.py", line 55, in _call_and_set_future
res = func(*args, **kwargs)
File "S:\anaconda_envs\sopa\lib\multiprocessing\process.py", line 140, in kill
self._popen.kill()
File "S:\anaconda_envs\sopa\lib\multiprocessing\popen_spawn_win32.py", line 123, in terminate
_winapi.TerminateProcess(int(self._handle), TERMINATE)
PermissionError: [WinError 5] Access is denied
Future exception was never retrieved
future: <Future finished exception=PermissionError(13, 'Access is denied', None, 5, None)>
Traceback (most recent call last):
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\process.py", line 55, in _call_and_set_future
res = func(*args, **kwargs)
File "S:\anaconda_envs\sopa\lib\multiprocessing\process.py", line 140, in kill
self._popen.kill()
File "S:\anaconda_envs\sopa\lib\multiprocessing\popen_spawn_win32.py", line 123, in terminate
_winapi.TerminateProcess(int(self._handle), TERMINATE)
PermissionError: [WinError 5] Access is denied
Future exception was never retrieved
future: <Future finished exception=PermissionError(13, 'Access is denied', None, 5, None)>
Traceback (most recent call last):
File "S:\anaconda_envs\sopa\lib\site-packages\distributed\process.py", line 55, in _call_and_set_future
res = func(*args, **kwargs)
File "S:\anaconda_envs\sopa\lib\multiprocessing\process.py", line 140, in kill
self._popen.kill()
File "S:\anaconda_envs\sopa\lib\multiprocessing\popen_spawn_win32.py", line 123, in terminate
_winapi.TerminateProcess(int(self._handle), TERMINATE)
PermissionError: [WinError 5] Access is denied
Expected behavior
Cellpose patches created and processed
System
OS: Windows 10
Python version : 3.10.15
RAM: 256GB
The text was updated successfully, but these errors were encountered:
As for the CPU, we're actually having a virtualized environment that shares resources between different VM. But each VM should have somewhere between 48 and 64 cores.
Alright, thanks for the details. I'm still experimenting with the dask Client, so I'll try to improve it over time to have a stable release in sopa 2.0.0
Description
When running the Cellpose segmentation using the
dask
backend, cell crashes after a while.Multiple workers showed the error
exceeded 95% memory budget. Restarting...". Then after a while it says that a task will be
marked as failed because 4 workers died while trying to run it`.Then it completely crashes with these errors :
Expected behavior
Cellpose patches created and processed
System
The text was updated successfully, but these errors were encountered: