This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
The pipeline is built using Nextflow and processes data using the following steps:
- Prefetch - download of SRA data
- Fasterq-dump - converts SRA files to FastQ files
- sort_fastq_files - sorts fastq files into single-end and paired-end
Prefetch is a tool of the SRA-tools package. It downloads and saves a .sra file for each SRA run accession.
For further reading and documentation see the SRA-Tools documentation.
sort_fastq_files is another tool of the SRA-tools package. It converts .sra files to FastQ files in a multithreaded manner. The files are automatically split during the conversion process into forward and reverse reads according to the sequencing strategy. The FastQ files are compressed to .fastq.gz files by pigz, to reduce the file size of the output.
For further reading and documentation see the SRA-Tools documentation.
sort_fastq_files~~~~ sorts the reads according to their orientation, which is either singleEnd or pairedEnd. During the conversion step to FastQ files in paired-end experiments, in some cases, unmatched reads are produced, which are sorted into a separate directory called 'pairedEnd/unmatched_reads'.
Output directory: results/sorted_output_files