Version 13 of the bioinformatic pipeline for SARS-CoV-2 sequence analysis used at the Folkehelseinstituttet
Docker-based solution for sequence analysis of SARS-CoV-2 Nanopore samples
git clone https://github.com/folkehelseinstituttet/FHI_SC2_Pipeline_Nanopore
docker build -t garcianacho/fhisc2:Nanopore FHI_SC2_Pipeline_Nanopore/
ArticV4.1:
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore SARS-CoV-2_Nanopore_Simplified_Docker_V12.sh ArticV4
ArticV3:
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore SARS-CoV-2_Nanopore_Simplified_Docker_V12.sh ArticV3
Midnight:
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore SARS-CoV-2_Nanopore_Simplified_Docker_V12.sh Midnight
Note that older versions of docker might require the flag --privileged and that multiuser systems might require the flag -u 1000 to run
The script expects the following folder structure:
./_ |-ExperimentXX.xlsx |-barcodeX.fastq |-barcodeY.fastq |-barcodeZ.fastq |-...
ArticV4:
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore SARS-CoV-2_Nanopore_Docker_V13.sh ArticV4
ArticV3:
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore SARS-CoV-2_Nanopore_Docker_V13.sh ArticV3
Midnight:
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore SARS-CoV-2_Nanopore_Docker_V13.sh Midnight
Note that older versions of docker might require the flag --privileged and that multiuser systems might require the flag -u 1000 to run
The script expects the following folder structure where the .fastq files are placed inside independent folders for each Sample
./_ |-ExperimentXX.xlsx |-GridXXX |-OppsettXXX |-XXXXXXXXFAXXXXXXXXXX |-sequencing_summary_FAXXXXX.txt |-fastq_pass |-barcode1 |-XXXX_pass_barcode01_XXXX.fastq |-YYYY_pass_barcode01_YYYY.fastq |-barcode2 |-barcode3 |-....
The script also expects a .xlsx file, that contains information about the position of the samples on a 96-well-plate, the links between Barcodes and sequenceID and the DNA concentration (alternatively this column can be used for the Ct-values). It is possible to download a template of the xlsx file here
-Summary including mutations found, pangolin lineage, number of reads, coverage, depth, etc...
-Bam files
-Consensus sequences
-Aligned consensus sequences
-Consensus nucleotide sequence for gene S
-Indels and frameshift identification run against FHIs frameshift-database
-Quality-control plot for the plate to detect possible contaminations
-Phylogenetic-tree plot of the samples
-Noise during variant calling across the genome
-Quality-control for contaminations/low-quality samples
-Amplicon efficacy of the selected primer-set for all the samples