Skip to content

q2 picrust2 Tutorial

Gavin Douglas edited this page Aug 8, 2019 · 67 revisions

Introduction

The basic PICRUSt2 plugin for QIIME2 is available here. This plugin can be integrated with the output files of the default QIIME2 pipelines - i.e. with denoising approaches rather than closed-reference OTU picking. Installation instructions are available on the QIIME2 plugin library website.

Before running this tutorial we recommend that you take a look through the standalone tutorial for a better description of the tool (click Tutorial on the right side-bar). Note that many of the features available in the standalone version are not implemented in the QIIME2 plugin yet. You should also read through the key limitations of metagenome inference.

Running details

Running the full PICRUSt2 pipeline requires at least 16 GB of RAM (and often more). This is a much higher memory requirement than used for PICRUSt1, since predictions are no longer pre-calculated for a particular 16S rRNA gene database. Instead, the genome prediction steps are run for every dataset.

There are two pipelines for running PICRUSt2 in q2-picrust2:

  1. Default pipeline that involves running sequence placement with EPA-NG. The tutorial below will focus on this approach.

  2. Modified pipeline that reads in placed sequences from another tool (e.g. q2-fragment-insertion). To avoid the increased memory requirements you can use this pipeline to use an alternative read placement approach, which is typically the most memory-intensive step of PICRUSt2. See here for details.

q2-picrust2 Running Example

You can see a description of the PICRUSt2 command by running:

qiime picrust2 full-pipeline --help

The required inputs are --i-table and --i-seq, which need to correspond to QIIME2 artifacts of types FeatureTable[Frequency] and FeatureData[Sequence], respectively. The Feature Table needs to contain the abundances of ASVs (i.e. a BIOM table) and the sequence file needs to be a FASTA file of the sequences for each ASV. There are many options available for the standalone PICRUSt2 version, but currently only selecting the number of threads (--p-threads), hidden-state prediction (HSP) method (--p-hsp-method), and maximum NSTI value (--p-max-nsti) are specified.

The --p-max-nsti option specifies how distantly placed a sequence needs to be in the reference phylogeny before it is excluded. The default cut-off is 2. In human datasets used for testing PICRUSt2 the only ASVs above this default cut-off were 18S sequences erroneously in 16S datasets, which suggests this cut-off is highly lenient. For environmental datasets a higher proportion of ASVs may be thrown out based on this default cut-off (Douglas et al. In Prep).

We will be using test files from the PICRUSt2 tutorial for this example. You can download these files using the below commands:

mkdir q2-picrust2_test_full

cd q2-picrust2_test_full

wget http://kronos.pharmacology.dal.ca/public_files/tutorial_datasets/picrust2_tutorial_files/mammal_biom.qza

wget http://kronos.pharmacology.dal.ca/public_files/tutorial_datasets/picrust2_tutorial_files/mammal_seqs.qza

wget http://kronos.pharmacology.dal.ca/public_files/tutorial_datasets/picrust2_tutorial_files/mammal_metadata.tsv

These files correspond to the ASV count table, ASV sequences, and metadata for the samples. There are 11 samples and 371 ASVs in total. These samples were collected from the mammalian stool of Arctic wolves, coyotes, beavers, and porcupines, as previously described.

You can run the full PICRUSt2 pipeline with the below command (should take ~25 min and 14 GB of RAM - this will be much faster if you can set more threads).

qiime picrust2 full-pipeline \
   --i-table mammal_biom.qza \
   --i-seq mammal_seqs.qza \
   --output-dir q2-picrust2_output \
   --p-threads 1 \
   --p-hsp-method pic \
   --p-max-nsti 2 \
   --verbose

Note in this case the pic the phylogenetic independent contrast hidden-state prediction method is indicated since it is fastest. However, we recommend that in practice users use the mp method (~54 min on 1 thread for this example).

The output artifacts of this command are the red boxes in the flowchart here.

These output files in (q2-picrust2_output) are:

  • ec_metagenome.qza - EC metagenome predictions (rows are EC numbers and columns are samples).
  • ko_metagenome.qza - KO metagenome predictions (rows are KOs and columns are samples).
  • pathway_abundance.qza - MetaCyc pathway abundance predictions (rows are pathways and columns are samples).

The artifacts are all of type FeatureTable[Frequency], which means they can be used with QIIME2 plugins that process and analyze these datatypes.

For instance, you can get summary information regarding the pathway abundance file with this command, which you can view like any QIIME2 visualization.

qiime feature-table summarize \
   --i-table q2-picrust2_output/pathway_abundance.qza \
   --o-visualization q2-picrust2_output/pathway_abundance.qzv

You should see a table like the one below, which describes the total pathway abundance per-sample (as well as other useful summaries). Note that this isn't the final result (that would be the FeatureTable[Frequency] files created above) you would analyze and is instead just a way to get some basic descriptions of the data.

Note that this file is not in units of relative abundance (e.g. percent) and is instead the sum of the predicted functional abundance contributed by each ASV multiplied by the abundance (the number of input reads) of each ASV.

The above metagenome predictions can be integrated into a number of QIIME2 analysis. For instance, you can quickly calculate diversity metrics based on these tables. The minimum sample pathway abundance found above was 239231, so we will rarify to this cut-off when calculating the core diversity metrics:

qiime diversity core-metrics \
   --i-table q2-picrust2_output/pathway_abundance.qza \
   --p-sampling-depth 239231 \
   --m-metadata-file mammal_metadata.tsv \
   --output-dir pathabun_core_metrics_out \
   --p-n-jobs 1

Take a look at pathabun_core_metrics_out/bray_curtis_emperor.qzv in the QIIME2 viewer. You should see a ordination plot like this, which shows how the clear difference between this dataset is between carnivores and rodents:

The final output QZA files are QIIME2 FeatureTable[Frequency] files, which can be used with existing QIIME2 programs. If you want to use the tables outside of QIIME2 you can run this command to convert the pathway abundance table to BIOM format:

qiime tools export \
   --input-path q2-picrust2_output/pathway_abundance.qza \
   --output-path pathabun_exported

This command will convert a BIOM file to plain-text:

biom convert \
   -i pathabun_exported/feature-table.biom \
   -o pathabun_exported/feature-table.biom.tsv \
   --to-tsv