Skip to content

Workflow

Gavin Douglas edited this page Sep 5, 2018 · 44 revisions

Below is an overview of the PICRUSt2 workflow, which includes example commands for processing 16S sequencing data and getting E.C. number abundances. These E.C. numbers can then be used to calculate MetaCyc pathway abundances and coverages. Note that there are other gene family databases supported which may be more informative (but which cannot be collapsed to pathways by default). See the side-bar for more details on individual commands.

Note that you can type the option -h to get a description of each below script. Note that the below command

1. Place amplicon sequence variants (or OTUs) into reference phylogeny (details)

place_seqs.py -s STUDY_SEQs.fna -o placed_seqs.tre --threads 10 --intermediate placement_working

2. Run hidden-state prediction to get 16S copy numbers, E.C. number, and KO abundances per predicted genome (details)

hsp.py -i 16S -t OUTPUT.tre -o 16S_predicted -p 10 -n

hsp.py -i EC -t OUTPUT.tre -o EC_predicted -p 10

hsp.py -i KO -t OUTPUT.tre -o KO_predicted -p 10

3. Predict gene family abundances in sequencing samples (adjusts gene family abundances by 16S sequence abundance) (details)

metagenome_pipeline.py -i study_seqs.biom \
                       -m 16S_predicted.tsv \
                       -f EC_predicted.tsv \
                       -p 4 \
                       -o metagenome_out

4. Infer pathway abundances (details)

run_minpath.py -i metagenome_out/pred_metagenome_unstrat.tsv \
               -m /path/to/picrust/MinPath/ec2metacyc_picrust_prokaryotic.txt \
               -o pathway_out \
               --intermediate minpath_working \
               -p 4

5. Add descriptions as new column in gene family and pathway abundance tables (details)

add_descriptions.py -i metagenome_out/pred_metagenome_unstrat.tsv -m EC -o metagenome_out/pred_metagenome_unstrat_descrip.tsv

add_descriptions.py -i pathway_out/path_abun_unstrat.tsv -m METACYC -o pathway_out/path_abun_unstrat_descrip.tsv