Skip to content

hydra-genetics/fusions

Repository files navigation

https://hydra-genetics-fusions.readthedocs.io

hydra-genetics/fusions

Snakemake module containing fusion callers for both DNA and RNA.

Lint Snakefmt

pycodestyle pytest

License: GPL-3

💬 Introduction

The module consists of fusion caller for RNA and DNA. The programs use .fastq-files or .bam-files as input.

❗ Dependencies

hydra-genetics pandas python snakemake singularity drmaa tabulate


OBSERVE

The small integration test is not run in this module as the programs need large test files to run.

🎒 Preparations

Sample and unit data

Input data should be added to samples.tsv and units.tsv. The following information need to be added to these files:

Column Id Description
samples.tsv
sample unique sample/patient id, one per row
units.tsv
sample same sample/patient id as in samples.tsv
type data type identifier (one letter), can be one of Tumor, Normal, RNA
platform type of sequencing platform, e.g. NovaSeq
machine specific machine id, e.g. NovaSeq instruments have @Axxxxx
flowcell identifier of flowcell used
lane flowcell lane number
barcode sequence library barcode/index, connect forward and reverse indices by +, e.g. ATGC+ATGC
fastq1/2 absolute path to forward and reverse reads
adapter adapter sequences to be trimmed, separated by comma

Reference data

The fusion callers all have there own references needed to run the programs. Please refer to each program in order to obtain the correct reference or use the this documentation for reference files used in one pipeline using this module.

🚀 Usage

To use this module in your workflow, follow the description in the snakemake docs. Add the module to your Snakefile like so:

module fusions:
    snakefile:
        github(
            "hydra-genetics/fusions",
            path="workflow/Snakefile",
            tag="v0.1.0",
        )
    config:
        config


use rule * from fusions as fusions_*

The workflow is designed for WGS data meaning huge datasets which require a lot of compute power. For HPC clusters, it is recommended to use a cluster profile and run something like:

snakemake -s /path/to/Snakefile --profile my-awesome-profile

Compatibility

Latest:

  • alignment:v0.5.1
  • prealignment:v1.2.0

See COMPATIBLITY.md file for a complete list of module compatibility.

Input files

File Description
alignment/samtools_merge_bam/{sample}_{type}.bam merged and sorted dna data from the alignment module
alignment/star/{sample}_{type}.bam aligned rna data from the alignmnet module
prealignment/merged/{sample}_{type}_fastq1.fastq.gz trimmed and merged fastq-file from the prealignment module
prealignment/merged/{sample}_{type}_fastq2.fastq.gz trimmed and merged fastq-file from the prealignment module

Output files

The following output files should be targeted via another rule:

File Description
fusions/arriba/{sample}_{type}.fusions.tsv RNA fusion predictions from Arriba
fusions/star_fusion/{sample}_{type}/star-fusion.fusion_predictions.tsv" RNA fusion predictions from StarFusion
fusions/fusioncatcher/{sample}_{type}/final-list_candidate-fusion-genes.hg19.txt RNA fusion predictions from FusionCatcher
fusions/gene_fuse_report/{sample}_{type}_gene_fuse_fusions_report.txt filtered DNA fusion predictions from GeneFuse
fusions/filter_fuseq_wes/{sample}_{type}.fuseq_wes.report.csv filtered DNA fusion predictions from FuseqWES

🧑‍⚖️ Rule Graph

rule_graph