Skip to content

Commit

Permalink
fix: switch mem_gb to mem_mb (#262)
Browse files Browse the repository at this point in the history
* fix: the mem_gb specification plus a default-resources specification of mem_mb for a cluster system leads to multiple distinct resource definitions that can get confused -- so we should just stick to the standard mem_mb here

* mapping.smk: make mem_mb value an int

* fix: make `mem_mb` requirement `rule annotate_umis` dynamic

The [documentation of `fgbio AnnotateBamWithUmis`](https://fulcrumgenomics.github.io/fgbio/tools/latest/AnnotateBamWithUmis.html) states, that this tool will read the entire input UMI fastq files into memory in an uncompressed format. As we work with gzipped fastq files, I would expect this to take about 4x the size of the input `fastq.gz` files according to [Table 2](https://academic.oup.com/view-large/394488195) of this paper:
Marius Nicolae and others, LFQC: a lossless compression algorithm for FASTQ files, Bioinformatics, Volume 31, Issue 20, October 2015, Pages 3276–3281, https://doi.org/10.1093/bioinformatics/btv384

As we should plan for some extra head space, but also have the `bam` file as another input, I think that `4*input.size_mb` should be a good estimate.

This can be rather heavy on the memory requirements, but this should be fine on modern servers and cluster systems -- and I think this workflow should usually be run on bigger compute infrastructure. So I think this is acceptable, but as an alternative we could sort the `fastq.gz` files beforehand and then use the `fgbio AnnotateBamWithUmis` flag `--sorted`.

* formatting

* go down to `2.5 * input.size_mb`, as BAM almost doubles input size

---------

Co-authored-by: Johannes Köster <[email protected]>
  • Loading branch information
dlaehnemann and johanneskoester authored Aug 24, 2023
1 parent cb893cc commit 6bda9b5
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion workflow/rules/mapping.smk
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ rule annotate_umis:
params:
extra=get_umi_read_structure,
resources:
mem_gb="10",
mem_mb=lambda wc, input: 2.5 * input.size_mb,
log:
"logs/fgbio/annotate_bam/{sample}.log",
wrapper:
Expand Down

0 comments on commit 6bda9b5

Please sign in to comment.