A Nextflow workflow to run vsearch for short read 16S data. The workflow includes the following steps:
- Trimming (cutadapt) and QC (fastQC)
- Merging and filtering reads (vsearch)
- Clustering and chimera removal (vsearch and swarm)
- OTU classification (vsearch sintax)
- The workflow is configured to work with docker or singularity. The singularity profile works with SLURM by default, an sbatch job can be submitted with the available example script.
- Fastq files must be named *_L001_R{1,2}_001.fastq.gz
- Install Nextflow
- Install Singularity or Docker
- Move to 'data' folder in main directory and download and unzip:
- https://www.drive5.com/sintax/rdp_16s_v18.fa.gz
- https://mothur.s3.us-east-2.amazonaws.com/wiki/silva.gold.bacteria.zip
- Run unzip -p silva.gold.bacteria.zip | sed -e "s/[.-]//g" > gold.fasta
- Run gunzip rdp_16s_v18.fa.gz
- Edit the file nextflow.config, edit runOptions to include the actual path of the data file in your environment
nextflow run 16SProcessing.nf --in_dir directory/with/fastq/files -profile (docker OR singularity)
nextflow run 16SProcessing.nf --in_dir test_16S_reads -profile (docker OR singularity)