-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch job ancestor matching. #921
Comments
Can't say I follow all the snakemaking here, but it looks quite logical and the tsinfer operations should be simple enough. I guess it's worth stress testing this with something large to make sure snakemake can handle the type of task graph we'll be creating? |
Have been playing around with this - with a large amount of partitioned groups snakemake takes a long time rebuilding the DAG between groups. I'm copying the code over to GeL to do some testing on a real ancestor grouping and to see if the DAG rebuilding is problematic with the file system. |
Ah, looks like I can refactor to avoid the checkpoints on init, but it means moving all the decision making about the number of partitions to the initial |
After some hairy
snakemake
deliberation here is the strawman pipeline for ancestor matching.It requires the following tsinfer methods:
match_ancestors_batch_init
- creates a folder with metadatamatch_ancestors_batch_group
- matches a group locally, writes ts to foldermatch_ancestors_batch_group_init
- creates a folder and writes metadata on partitions for a groupmatch_ancestors_batch_group_partition
- matches a partition of ancestors for a groupmatch_ancestors_batch_group_finalise
- uses the partitions to write a ts for the groupmatch_ancestors_batch_finalise
- writes final ts.@jeromekelleher
The text was updated successfully, but these errors were encountered: