Skip to content

Snakemake module containing processing steps that should be performed before sequence alignment.

License

Notifications You must be signed in to change notification settings

hydra-genetics/prealignment

Repository files navigation

https://hydra-genetics-prealignment.readthedocs.io

hydra-genetics/prealignment

Snakemake module containing processing steps that should be performed before sequence alignment

Lint Snakefmt snakemake dry run integration test

License: GPL-3

Visit for more detailed information.

💬 Introduction

The module consists of alignment pre-processing steps, such as trimming and merging of .fastq-files as well as filtering out rRNA sequences from RNA reads. We strongly recommend trimming .fastq-files prior to alignment. To enable trimming the trimmer_software-stanza in the config.yaml may be set to the name of the trimming rule, e.g. fastp_pe, or None if trimming should be omitted. For rRNA filtering, SortMeRNA can be used. Input data should be specified via samples.tsv and units.tsv.

❗ Dependencies

In order to use this module, the following dependencies are required:

hydra-genetics pandas python snakemake singularity

Note! Releases of prealignment <= v0.4.0 needs tabulate<0.9.0 added in requirements.txt

🎒 Preparations

Sample and unit data

Input data should be added to samples.tsv and units.tsv. The following information need to be added to these files:

Column Id Description
samples.tsv
sample unique sample/patient id, one per row
units.tsv
sample same sample/patient id as in samples.tsv
type data type identifier (one letter), can be one of Tumor, Normal, RNA
platform type of sequencing platform, e.g. NovaSeq
machine specific machine id, e.g. NovaSeq instruments have @Axxxxx
flowcell identifier of flowcell used
lane flowcell lane number
barcode sequence library barcode/index, connect forward and reverse indices by +, e.g. ATGC+ATGC
fastq1/2 absolute path to forward and reverse reads
adapter adapter sequences to be trimmed, separated by comma

Reference data

An array of reference .fasta-files should be specified in config.yaml in the section sortmerna and fasta. These files are readily available as part of the github repo. In addition, these files should be indexed using SortMeRNA and the filepath set at sortmerna and index.

✅ Testing

The workflow repository contains a small test dataset .tests/integration which can be run like so:

$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --configfile config.yaml --use-singularity

🚀 Usage

To use this module in your workflow, follow the description in the snakemake docs. Add the module to your Snakefile like so:

module prealignment:
    snakefile:
        github(
            "hydra-genetics/prealignment",
            path="workflow/Snakefile",
            tag="v1.1.0",
        )
    config:
        config


use rule * from prealignment as prealignment_*

Output files

The following output files should be targeted via another rule:

File Description
prealignment/merged/{sample}_{type}_fastq1.fastq.gz Merged and possibly trimmed forward reads
prealignment/merged/{sample}_{type}_fastq2.fastq.gz Merged and possibly trimmed reverse reads
prealignment/sortmerna/{sample}_{type}.fq.gz combined forward and reverse reads without rRNA sequences

🧑‍⚖️ Rule Graph

Trim and merge fastq

rule_graph

Only merge fastq

rule_graph

About

Snakemake module containing processing steps that should be performed before sequence alignment.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages