Snakemake module containing processing steps that should be performed during sequence alignment.
The module consists of alignment processing steps, such as alignment of .fastq
-files. and duplicates marking
.bam
-files.
In order to use this module, the following dependencies are required:
Input data should be added to samples.tsv
and units.tsv
.
The following information need to be added to these files:
Column Id | Description |
---|---|
samples.tsv |
|
sample | unique sample/patient id, one per row |
units.tsv |
|
sample | same sample/patient id as in samples.tsv |
type | data type identifier (one letter), can be one of Tumor, Normal, RNA |
platform | type of sequencing platform, e.g. NovaSeq |
machine | specific machine id, e.g. NovaSeq instruments have @Axxxxx |
flowcell | identifer of flowcell used |
lane | flowcell lane number |
barcode | sequence library barcode/index, connect forward and reverse indices by + , e.g. ATGC+ATGC |
fastq1/2 | absolute path to forward and reverse reads |
adapter | adapter sequences to be trimmed, separated by comma |
You need have a indexed reference genome: ex reference.fna
For bwa the files are generated by bwa index. Dict files is generated using picard CreateSequenceDictionary
. fai is generated using samtools index
File | Description |
---|---|
reference.dict | dictionary file |
reference.fna.amb | record appearance of N (or other non-ATGC) in the ref fasta |
reference.fna.ann | record ref sequences, name, length, etc |
reference.fna.bwt | the Burrows-Wheeler transformed sequence |
reference.fna.fai | index file |
reference.fna.pac | packaged sequence (four base pairs encode one byte) |
reference.fna.sa | suffix array index |
The workflow repository contains a small test dataset .tests/integration
which can be run like so:
$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --use-singularity
To use this module in your workflow, follow the description in the
snakemake docs.
Add the module to your Snakefile
like so:
module alignment:
snakefile:
github(
"hydra-genetics/alignment",
path="workflow/Snakefile",
tag="v0.1.0",
)
config:
config
use rule * from alignment as alignment_*
Latest:
- prealignment:v0.2.0
See COMPATIBLITY.md file for a complete list of module compatibility.
File | Description |
---|---|
hydra-genetics/prealignment data |
|
prealignment/fastp_pe/{sample}_{flowcell}_{lane}_{type}_fastq1.fastq.gz |
trimmed forward reads |
prealignment/fastp_pe/{sample}_{flowcell}_{lane}_{type}_fastq1.fastq.gz |
trimmed reverse reads |
original fastq files |
|
PATH/fastq1.fastq.gz |
forward reads retrieved from units.tsv |
PATH/fastq2.fastq.gz |
reverse reads retrieved from units.tsv |
The following output files should be targeted via another rule:
File | Description |
---|---|
alignment/samtools_merge_bam/{sample}_{type}.bam |
aligned data which have been duplicate marked |