Skip to content

Comprehensive pipeline for detection of TB from ONT adaptive sequencing and amplicon data.

License

Notifications You must be signed in to change notification settings

HKU-BAL/ONT-TB-NF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ONT-TB-NF

License Nextflow Clair3 ONT Adaptive

The ONT-TB-NF pipeline is built to detect Mycobacterium tuberculosis (TB) antibiotic-resistance genes from ONT sequencing data.

The input sequencing data can be obtained from various settings of ONT sequencing, including

  • Adaptive sequencing (like from readfish or UNCALLED),
  • Amplicon sequencing (by amplifying specific regions in the TB genome),
  • Standard whole genome sequencing (WGS).

The ONT-TB-NF pipeline includes steps of basecalling, quality control, target regions alignment, variant calling, and antimicrobial resistance prediction.

Features

  • One command pipeline from sequencing data to TB analysis report.
  • Tailor-made for Adaptive sequencing data, Amplicon, and WGS data.
  • Basecalling with ONT's Guppy, the whole pipeline can start from fast5 files.

Contents



Installation

Install Nextflow by using the following command:

curl -s https://get.nextflow.io | bash 

Install required packages with conda and docker:

docker pull hkubal/clair3:v0.1-r12
docker pull quay.io/biocontainers/tb-profiler:4.3.0--pypyh5e36f6f_0

conda create -n ont_tb samtools=1.15.1 minimap2=2.24 nanoplot=1.40.2 mosdepth=0.3.3 flye=2.9.1 nanofilt fastqc bedtools -c bioconda
conda activate ont_tb
# clone ONT-TB-NF
git clone https://github.com/HKU-BAL/ONT-TB-NF.git
cd ONT-TB-NF

Usage

Launch the pipeline execution with the following command:

conda activate ont_tb

nextflow run_tb.nf --help
nextflow run_tb_amplicon.nf --help

For adaptive sequencing

Make sure you are in the ont_tb environment with the command of conda activate ont_tb.

TB_NF_DIR={ONT-TB-NF PATH}
NF_S=${TB_NF_DIR}/run_tb.nf

SAMPLE_ID={NAME}
FQ={YOUR FQ FILE}
THREADS={THREAD}                  # threads number, e.g. 16
OUT_DIR={ABSOLUTE OUTPUT PATH}    # output path, abolute path required

nextflow run ${NF_S} \
--read_fq ${FQ} \
--sample_name ${SAMPLE_ID} \
--threads ${THREADS} \
--output_dir ${OUT_DIR}

For amplicon sequencing

Make sure you are in the ont_tb environment with the command of conda activate ont_tb.

For Amplicon sequencing data, the pipeline needs to be provided with the amplicon bed regions from --amplicon_bed option.

TB_NF_DIR={ONT-TB-NF PATH}
NF_S=${TB_NF_DIR}/run_tb_amplicon.nf

AMPLICON_BED={AMPLICON BED}       # your amplicon region 
FQ={YOUR FQ FILE}
SAMPLE_ID={NAME}
THREADS={THREAD}                  # threads number, e.g. 16
OUT_DIR={ABSOLUTE OUTPUT PATH}    # output path, abolute path required

nextflow run ${NF_S} \
--read_fq ${FQ} \
--sample_name ${SAMPLE_ID} \
--amplicon_bed ${GENE_BED} \
--threads ${THREADS} \
--output_dir ${OUT_DIR}

Using Guppy for basecalling

FAST5_DIR={Input FAST5 folders}
GUPPY_BASECALLER_PATH={Guppy basecaller path}                       # e.g. guppy_basecaller
GUPPY_CONFIG={Guppy config file path}                               # e.g. dna_r10.4_e8.1_sup.cfg

SAMPLE_ID={NAME}
TB_NF_DIR={ONT-TB-NF PATH}
NF_S=${TB_NF_DIR}/run_tb_amplicon.nf                                # e.g. run_tb.nf or run_tb_amplicon.nf
THREADS={THREAD}                                                    # threads number, e.g. 16
OUT_DIR={ABSOLUTE OUTPUT PATH}                                      # output path, abolute path required

nextflow run ${NF_S} --fast5_dir ${FAST5_DIR} --guppy_basecaller_path ${GUPPY_BASECALLER_PATH} --guppy_config_path ${GUPPY_CONFIG} --guppy_options "--device 'cuda:0'" --sample_name ${SAMPLE_ID} --threads ${THREADS} --output_dir ${OUT_DIR}

Pipeline Summary

For apply at WGS and adaptive sequencing, please use the default run_tb.nf pipeline.

For analysis of the Amplicon sequencing data, please use the run_tb_amplicon.nf pipeline.

In general, the ONT-TB-NF pipeline performs the following tasks:

Image

Pipeline results

Here is a brief description of output files created for each sample, optional module are labeled with [O]:

[O] Basecalling results at:       {YOUR OUTPUR DIR}/0_bc
    QC results at:                {YOUR OUTPUR DIR}/1_qc
    Aligment results at:          {YOUR OUTPUR DIR}/2_aln
    Variant calling results at:   {YOUR OUTPUR DIR}/3_vc
    TB analysis report at:        {YOUR OUTPUR DIR}/4_tb
[O] taxonomic classification at:  {YOUR OUTPUR DIR}/5_mpn

Requirements