fix: switch mem_gb to mem_mb (#262) · snakemake-workflows/dna-seq-varlociraptor@6bda9b5

Commit

fix: switch mem_gb to mem_mb (#262)

* fix: the mem_gb specification plus a default-resources specification of mem_mb for a cluster system leads to multiple distinct resource definitions that can get confused -- so we should just stick to the standard mem_mb here

* mapping.smk: make mem_mb value an int

* fix: make `mem_mb` requirement `rule annotate_umis` dynamic

The [documentation of `fgbio AnnotateBamWithUmis`](https://fulcrumgenomics.github.io/fgbio/tools/latest/AnnotateBamWithUmis.html) states, that this tool will read the entire input UMI fastq files into memory in an uncompressed format. As we work with gzipped fastq files, I would expect this to take about 4x the size of the input `fastq.gz` files according to [Table 2](https://academic.oup.com/view-large/394488195) of this paper:
Marius Nicolae and others, LFQC: a lossless compression algorithm for FASTQ files, Bioinformatics, Volume 31, Issue 20, October 2015, Pages 3276–3281, https://doi.org/10.1093/bioinformatics/btv384

As we should plan for some extra head space, but also have the `bam` file as another input, I think that `4*input.size_mb` should be a good estimate.

This can be rather heavy on the memory requirements, but this should be fine on modern servers and cluster systems -- and I think this workflow should usually be run on bigger compute infrastructure. So I think this is acceptable, but as an alternative we could sort the `fastq.gz` files beforehand and then use the `fgbio AnnotateBamWithUmis` flag `--sorted`.

* formatting

* go down to `2.5 * input.size_mb`, as BAM almost doubles input size

---------

Co-authored-by: Johannes Köster <[email protected]>

Loading branch information

dlaehnemann and johanneskoester authored Aug 24, 2023

1 parent cb893cc commit 6bda9b5

workflow/rules/mapping.smk

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -37,7 +37,7 @@ rule annotate_umis:
  
        params:

            extra=get_umi_read_structure,

        resources:

            mem_gb="10",

            mem_mb=lambda wc, input: 2.5 * input.size_mb,

        log:

            "logs/fgbio/annotate_bam/{sample}.log",

        wrapper:

0 comments on commit `6bda9b5`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `6bda9b5`

Commit

There are no files selected for viewing

0 comments on commit 6bda9b5

0 comments on commit `6bda9b5`