Infer pathway abundances

Pathway abundances are calculated using the same approach as HUMAnN2 based on the abundances of gene families that can be linked to reactions within pathways (E.C. numbers regrouped to MetaCyc reactions be default). By default, pathways will first be identified as present or not with MinPath.

Either a structured or unstructured pathway mapfile can be input (the mapfile is structured by default), which will identify which set of pathways are likely present based on the presence of requisite gene families.

Input gene family abundances can be stratified or unstratified by contributing organisms; however, stratified pathway abundances will only be written if the input gene families are in stratified format. Note that stratified abundances refer to how much each predicted genome is contributing to the community pathway abundances (not the predicted level of that pathway within that organism alone!). To get pathway abundances broken down by contributing sequence you need to use the --per_sequence_contrib option (see below).

There are two default mapfiles used by this script. These files are specified by default so you do not need to specify them yourself! However, it is useful to understand what this script does by default. First E.C. numbers are regrouped to MetaCyc RXNs using this mapfile: default_files/pathway_mapfiles/ec_level4_to_metacyc_rxn.tsv. These MetaCyc RXNs can then be used to infer MetaCyc pathway abundances using this mapfile: default_files/pathway_mapfiles/metacyc_path2rxn_struc_filt_pro.txt. This second mapfile contains maps of reactions to pathways for the subset of MetaCyc pathways found in prokaryotes.

Use this command to run MinPath on the outputted predicted gene families to get unstratified pathway abundances (of pathways found in prokaryotes):

pathway_pipeline.py -i EC_metagenome_out/pred_metagenome_unstrat.tsv.gz \
                    -o pathways_out \
                    --intermediate minpath_working \
                    -p 1

The input arguments and options to this command are:

-i metagenome_out/pred_metagenome_strat.tsv.gz - Stratified or unstratified output of metagenome_pipeline.py
-m MINPATH_MAPFILE - path to mapfile of gene families to pathways of interest (default: default_files/pathway_mapfiles/metacyc_path2rxn_struc_filt_pro.txt).
-o pathway_out - Output folder to write final pathway abundance tables.
--intermediate - Optional folder where intermediate files will be written (otherwise the intermediate files will not be kept).
--coverage - Calculate pathway coverages as well as abundances, which are an alternative way to identify which pathway are present. Note that these values are experimental and only useful for advanced users. Coverage is also calculated using the same approach as HUMAnN2.
--no_gap_fill - Turn of gap-filling (which boosts the lowest reaction abundance in a pathway by default).
--no_regroup - Turn off re-grouping to reactions: this is necessary if the gene families you are inputting can be directly related to pathways.
--skip_minpath -Do not run MinPath to identify which pathways are present as a first pass (MinPath is run by default).
--regroup_map - Mapfile to use for regrouping input gene family abundances to reactions (default: default_files/pathway_mapfiles/ec_level4_to_metacyc_rxn.tsv)
-p INT - Number of processes to run in parallel.
--per_sequence_contrib - Option to specify that stratified abundances should be reported in terms of the contribution by each predicted genome rather than how much each genome is contributing to the overall community abundance. In other words, pathway abundances will be calculated for each individual predicted genome. Both --per_sequence_abun and --per_sequence_function need to be specified when this option is set. Stratified coverages will only be reported when this option is used (and --coverage is set). As of v2.2.0-b, unstratified pathway abundances based on the community-wide pathway abundances and also based on the per-sequence pathway abundances will be output when this option is used.
--per_sequence_abun - Path to sequence abundance table normalized by marker-gene abundances (file output by metagenome_pipeline.py step named "seqtab_norm.tsv.gz" by default).
--per_sequence_function - Path to predicted gene family abundance per sequence (main output file of hidden-state prediction step named "EC_predicted.tsv.gz" by default).
--wide_table - Flag to specify that wide-format stratified table should be output rather than metagenome contribution table. This is the deprecated method of generating stratified tables since it is extremely memory intensive (default: False).

Please first check our FAQ if you have any questions about PICRUSt2.

For other general questions and comments about PICRUSt2 please search the PICRUSt google group. If the question has not been previously answered then please make a new thread.

To report a bug or to make a feature request please make a new issue at the top of this page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infer pathway abundances

Home

Major bug reports and announcements

Key limitations

Installation

Workflow

Tutorial

QIIME 2 plugin

Validation with paired metagenomes

FAQ

Clone this wiki locally