This is a snakemake
pipeline that takes Oxford Nanopore Sequencing (ONT) data (fastq) as input, generates fastq stats using nanostat
, performs fastq processing and filtering using pychopper
, map the reads to the genome using minimap2
and uses talon
to assemble and quantify transcripts. It is forked from ANNSeq
. Below is the dag of the pipeline:
- ONT fastq reads
- Reference genome assembly in fasta format
- GTF: Gencode GTF; tested on v38 comprehensive CHR gene annotation
- miniconda
- The rest of the dependencies (including snakemake) are installed via conda through the
environment.yml
file
Clone the directory:
git clone --recursive https://github.com/egustavsson/pipeline-isoforms-ONT-TALON.git
Create conda environment for the pipeline which will install all the dependencies:
cd pipeline-isoforms-ONT-TALON
conda env create -f environment.yml
Edit config.yml
to set up the working directory and input files/directories. snakemake
command should be issued from within the pipeline directory. Please note that before you run any of the snakemake
commands, make sure to first activate the conda environment using the command conda activate ont_talon
.
cd pipeline-isoforms-ONT-TALON
conda activate ont_talon
snakemake --use-conda -j <num_cores> all
It is a good idea to do a dry run (using -n parameter) to view what would be done by the pipeline before executing the pipeline.
snakemake --use-conda -n all
You can visualise the processes to be executed in a DAG:
snakemake --dag | dot -Tpng > dag.png
To exit a running snakemake
pipeline, hit ctrl+c
on the terminal. If the pipeline is running in the background, you can send a TERM
signal which will stop the scheduling of new jobs and wait for all running jobs to be finished.
killall -TERM snakemake
To deactivate the conda environment:
conda deactivate
working directory
|--- config.yml # a copy of the parameters used in the pipeline
|--- Nanostat/
|-- # output of nanostat - fastq stats
|--- Pychopper/
|-- # output of pychopper - filtered fastq
|--- Mapping/
|-- # output of minimap2 - aligned reads
|--- Talon/
|-- # output of Talon
|-- _talon.gtf # assembled transcripts
|-- _talon_abundance_filtered.tsv # transcript abundance