The nextflow script allows you to perform QC and Assembly operations.
The script uses FastQC and Trimmomatic to perform QC and SPAdes to perform assembly operations.
The script assumes the following tools and their dependencies have been installed:
- SPAdes version 3.15.5
- Nextflow version 23.10.1
- fastqc v0.12.1
- trimmomatic version 0.39
- entrez-direct
- sra-tools
- biopython
You can get these using the following commands:
conda install -c bioconda -c conda-forge entrez-direct sra-tools fastqc trimmomatic pigz -y
conda install python=2.7 -y
conda install -c conda-forge biopython -y
From SPAdes documentation: https://github.com/ablab/spades/
wget http://cab.spbu.ru/files/release3.15.5/SPAdes-3.15.5.tar.gz
tar -xzf SPAdes-3.15.5.tar.gz
cd SPAdes-3.15.5
./spades_compile.sh
From Nextflow documentation: https://www.nextflow.io/docs/latest/getstarted.html
sdk install java 17.0.6-tem
(or) sdk install java 17.0.6-amzn
wget -qO- https://get.nextflow.io | bash
chmod +x nextflow
nextflow run <test_script>
nextflow run $NEXTFLOW_SCRIPT --reads $FASTQ_READS_DIR
git clone https://github.com/surakshavinod/QC-and-Assembly-using-Nextflow.git
nextflow run main.nf --reads $READS_DIR
- The flag values for trimming have been hard coded into the script. Please feel free to change the values as necessary.
NOTE: You can also use your own fastq files for analysis.
- Get the fastq files using the following code:
fasterq-dump \
$SRR_ID
--threads $NO_OF_THREADS \
--outdir $OUT_DIR \
--split-files \
--skip-technical
- Zip your fastq files
pigz -9f $fastq_files
The output will be in a folder called "results" in your working directory.
- raw_qa/ folder contains the HTML report generated by FastQC
- trim/ folder contains the trimmed reads
- asm/ folder contains the SPAdes output