Skip to content

Workflow

Gavin Douglas edited this page Oct 12, 2018 · 44 revisions

Below is an overview of the PICRUSt2 workflow, which includes example commands for processing 16S sequencing data and getting E.C. number and KEGG ortholog (KO) abundances. The E.C. numbers can then be used to calculate MetaCyc pathway abundances and coverages. Note that there are other gene family databases supported which may be more informative (but which cannot be collapsed to pathways by default). See the side-bar for more details on individual commands.

Note that you can type the option -h to get a description of each below script.

The entire pipeline can be run with this command (details):

picrust2_pipeline.py -s study_seqs.fna -i study_seqs.biom -o picrust2_out_pipeline \
                     --threads 10 -n

If you would like to run each step individually you can also do that using the below commands. Using these commands is useful when you're running into problems using picrust2_pipeline.py and want to isolate an issue or if you only want to re-run part of the PICRUSt2 pipeline.

Place amplicon sequence variants (or OTUs) into reference phylogeny (details)

place_seqs.py -s study_seqs.fna -o placed_seqs.tre --threads 10 --intermediate placement_working

Run hidden-state prediction to get 16S copy numbers, E.C. number, and KO abundances per predicted genome (details).

Note that NSTI values will be added to the 16S prediction table (since the -n option was given).

hsp.py -i 16S -t placed_seqs.tre -o 16S_predicted -p 1 -n

hsp.py -i EC -t placed_seqs.tre -o EC_predicted -p 10

hsp.py -i KO -t placed_seqs.tre -o KO_predicted -p 10

Predict E.C. and KO abundances in sequencing samples (adjusts gene family abundances by 16S sequence abundance) (details)

metagenome_pipeline.py -i study_seqs.biom \
                       -m 16S_predicted.tsv \
                       -f EC_predicted.tsv \
                       -p 10 \
                       -o EC_metagenome_out


metagenome_pipeline.py -i study_seqs.biom \
                       -m 16S_predicted.tsv \
                       -f KO_predicted.tsv \
                       -p 10 \
                       -o KO_metagenome_out

Infer MetaCyc pathway abundances and coverages based on predicted E.C. number abundances (details)

run_minpath.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv \
               -o pathways_out \
               --intermediate minpath_working \
               -p 10

5. Add descriptions as new column in gene family and pathway abundance tables (details)

add_descriptions.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv -m EC -o EC_metagenome_out/pred_metagenome_unstrat_descrip.tsv

add_descriptions.py -i KO_metagenome_out/pred_metagenome_unstrat.tsv -m KO -o KO_metagenome_out/pred_metagenome_unstrat_descrip.tsv

add_descriptions.py -i pathways_out/path_abun_unstrat.tsv -m METACYC -o pathways_out/path_abun_unstrat_descrip.tsv