fix: switch mem_gb to mem_mb #262

…of mem_mb for a cluster system leads to multiple distinct resource definitions that can get confused -- so we should just stick to the standard mem_mb here

The [documentation of `fgbio AnnotateBamWithUmis`](https://fulcrumgenomics.github.io/fgbio/tools/latest/AnnotateBamWithUmis.html) states, that this tool will read the entire input UMI fastq files into memory in an uncompressed format. As we work with gzipped fastq files, I would expect this to take about 4x the size of the input `fastq.gz` files according to [Table 2](https://academic.oup.com/view-large/394488195) of this paper: Marius Nicolae and others, LFQC: a lossless compression algorithm for FASTQ files, Bioinformatics, Volume 31, Issue 20, October 2015, Pages 3276–3281, https://doi.org/10.1093/bioinformatics/btv384 As we should plan for some extra head space, but also have the `bam` file as another input, I think that `4*input.size_mb` should be a good estimate. This can be rather heavy on the memory requirements, but this should be fine on modern servers and cluster systems -- and I think this workflow should usually be run on bigger compute infrastructure. So I think this is acceptable, but as an alternative we could sort the `fastq.gz` files beforehand and then use the `fgbio AnnotateBamWithUmis` flag `--sorted`.

Commits on Aug 17, 2023

mapping.smk: make mem_mb value an int

dlaehnemann authored Aug 17, 2023

Configuration menu

View commit details

Copy full SHA for d21e7dc

Browse repository at this point

Copy the full SHA

d21e7dc View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: switch mem_gb to mem_mb #262

fix: switch mem_gb to mem_mb #262

Commits on Aug 16, 2023

Commits on Aug 17, 2023

Commits on Aug 21, 2023

Commits on Aug 22, 2023

Commits on Aug 23, 2023