You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the "issues" with sharding at the moment is that once the mapped BAMs get collected again for merging, the order of files is inconsistent, meaning that downstream BAMs and their headers are unstable in terms of checksums.
The inconsistency comes from the record of the mapping command, which contains the shard filename. Example output from the test profile:
I think we could use some channel manipulation to sort the filenames going into shard merging, to ensure that the headers of downstream BAMs are consistent across runs (i.e. get the command to always show part_001).
I think at that point we could also change sharding to be turned on by default, as it makes sense to use it in most cases, instead of requiring users to configure the resources for their mapper manually.
The text was updated successfully, but these errors were encountered:
One of the "issues" with sharding at the moment is that once the mapped BAMs get collected again for merging, the order of files is inconsistent, meaning that downstream BAMs and their headers are unstable in terms of checksums.
The inconsistency comes from the record of the mapping command, which contains the shard filename. Example output from the
test
profile:I think we could use some channel manipulation to sort the filenames going into shard merging, to ensure that the headers of downstream BAMs are consistent across runs (i.e. get the command to always show
part_001
).I think at that point we could also change sharding to be turned on by default, as it makes sense to use it in most cases, instead of requiring users to configure the resources for their mapper manually.
The text was updated successfully, but these errors were encountered: