Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: switch mem_gb to mem_mb #262

Merged
merged 6 commits into from
Aug 24, 2023

Commits on Aug 16, 2023

  1. fix: the mem_gb specification plus a default-resources specification …

    …of mem_mb for a cluster system leads to multiple distinct resource definitions that can get confused -- so we should just stick to the standard mem_mb here
    dlaehnemann committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    835ba93 View commit details
    Browse the repository at this point in the history

Commits on Aug 17, 2023

  1. Configuration menu
    Copy the full SHA
    d21e7dc View commit details
    Browse the repository at this point in the history

Commits on Aug 21, 2023

  1. Configuration menu
    Copy the full SHA
    12f0b98 View commit details
    Browse the repository at this point in the history

Commits on Aug 22, 2023

  1. fix: make mem_mb requirement rule annotate_umis dynamic

    The [documentation of `fgbio AnnotateBamWithUmis`](https://fulcrumgenomics.github.io/fgbio/tools/latest/AnnotateBamWithUmis.html) states, that this tool will read the entire input UMI fastq files into memory in an uncompressed format. As we work with gzipped fastq files, I would expect this to take about 4x the size of the input `fastq.gz` files according to [Table 2](https://academic.oup.com/view-large/394488195) of this paper:
    Marius Nicolae and others, LFQC: a lossless compression algorithm for FASTQ files, Bioinformatics, Volume 31, Issue 20, October 2015, Pages 3276–3281, https://doi.org/10.1093/bioinformatics/btv384
    
    As we should plan for some extra head space, but also have the `bam` file as another input, I think that `4*input.size_mb` should be a good estimate.
    
    This can be rather heavy on the memory requirements, but this should be fine on modern servers and cluster systems -- and I think this workflow should usually be run on bigger compute infrastructure. So I think this is acceptable, but as an alternative we could sort the `fastq.gz` files beforehand and then use the `fgbio AnnotateBamWithUmis` flag `--sorted`.
    dlaehnemann authored Aug 22, 2023
    Configuration menu
    Copy the full SHA
    09fe039 View commit details
    Browse the repository at this point in the history
  2. formatting

    dlaehnemann authored Aug 22, 2023
    Configuration menu
    Copy the full SHA
    d9a4d33 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2023

  1. Configuration menu
    Copy the full SHA
    cf949f7 View commit details
    Browse the repository at this point in the history