Metagenomic and genomic surveillance of antimicrobial resistance in hospital wastewater (ATTACK-AMR) Analysis Pipeline
The cooperative research project ATTACK-AMR aims to deliver alternative, non-antibiotic therapies to combat antimicrobial resistant (AMR) pathogens. AMR is one of the biggest challenges facing healthcare industries and is on a rapid rise as a result of the overuse of antibiotics. Replacing antibiotics with alternative products will delay the resistance and restore the activity of antibiotics that are no longer effective due to resistance.
This serves as a guide to run the analysis pipeline written in Snakemake.
This Snakemake pipeline requires the package manager Conda and the workflow management system Snakemake. Additional dependencies not handled by Snakemake are described in Section 1.3.
$ curl -sL \
"https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > \
"Miniconda3.sh"
$ bash Miniconda3.sh
$ conda update conda
$ rm Miniconda3.sh
$ conda install wget
$ conda config --add channels conda-forge
$ conda update -n base --all
$ conda install -n base mamba
$ mamba create -c conda-forge -c bioconda -n snakemake snakemake
This creates an isolated enviroment containing the latest Snakemake. To activate it:
$ conda activate snakemake
To test snakemake:
$ snakemake --help
Install git and gawk. We require gawk to process the filtering stage of our databases.
$ mamba install git
$ mamba install gawk
Download ATTACK-AMR from the online repository, or using the command line:
$ git clone https://github.com/bioinfodlsu/attack_amr_pipeline
The pipeline requires, at the very least: (1) Metagenomic sequences (sample sequences can be downloaded at (tentative)), and (2) reference protein databases for (CARD, [Kraken] (https://benlangmead.github.io/aws-indexes/k2) MGE).
Note: For CARD, only the nucleotide_fasta_protein_homolog_model.fasta file was used. For Kraken, the Standard-16 Database was used for taxonomic analysis. CARD and MGE fasta files were renames as card.fasta and MGE.fasta respectively.
For Metaphlan, running
$ metaphlan --install --bowtie2db /attack_amr_pipeline/metaphlan
should download the latest database for you. It is a prerequisite to have Metaphlan installed locally using
$ conda install -c bioconda metaphlan
All downloaded databases should be placed in the following directories:
- Metagenomic Sequences: ~/data
- CARD: ~/card_db
- Kraken: ~/kraken2_db
- Metaphlan: ~/metaphlan
- MGE: ~/MGE_db
These and other input parameters are specified via a YAML-format config file -- config.yaml is provided in the config folder.
After constructing a config.yaml file and with the snakemake conda environment activated, you can call the pipeline from the top-level directory of ATTACK-AMR:
$ cd attack_amr_pipeline
$ snakemake --use-conda --cores all
Outputs are stored the top-level directory of ATTACK-AMR. The following outputs should be present.
ARG (CARD):
- card_db/card_length.txt
- card_out/ARG_genemat.txt
- metaxa2/metaxa_genus.txt
Taxonomic (Kraken and Metaphlan):
- kreport2mpa_norm/merged_metakraken_abundance_table.txt
- metaphlan/merged_abundance_table.txt
MGE
- MGE_db/MGE_length.txt
- MGE_out/MGE_genemat.txt
Before running the R analysis notebooks, it is ideal to place all of the above output files into one directory (the same directory where the R analysis notebooks are at).
First download the metadata (metadata.csv) and CARD drug class information (card_drug_class.txt) before running the notebooks located in the repository.
Analysis scripts are made each for ARG analysis, Taxonomic analysis and MGE analysis. These are located in the notebooks folder in the repository. The notebooks are written in R used to produced data visualizations which include stacked bar plots, PCA plots and box plots for diversity analysis.
Note that some library dependencies may need to be first installed through this command:
install.packages('<insert name of library>')