Split large ancestor groups up for both caching and dask scheduling. #838

benjeffery · 2023-06-18T00:13:13Z

We have very large ancestor groups towards the end of matching. As these take over a month of CPU each it would be best to split them up. This would mean we could resume them mid-way and have smaller dask.bag partitions. The smaller partitions result in better utilization from interrupting tasks due to worker roll-over and also reduce wasted worker time at the end of a group due to better tesselation of smaller tasks.

The text was updated successfully, but these errors were encountered:

benjeffery · 2024-06-06T08:43:23Z

Superseded by #921

benjeffery added this to the Release 0.4.0 milestone Jun 18, 2023

benjeffery closed this as completed Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split large ancestor groups up for both caching and dask scheduling. #838

Split large ancestor groups up for both caching and dask scheduling. #838

benjeffery commented Jun 18, 2023

benjeffery commented Jun 6, 2024

Split large ancestor groups up for both caching and dask scheduling. #838

Split large ancestor groups up for both caching and dask scheduling. #838

Comments

benjeffery commented Jun 18, 2023

benjeffery commented Jun 6, 2024