Working with "Web of Life" (WoL)

The "Web of Life" (WoL) project is a series efforts to reconstruct an accurate reference phylogeny for microbial genomes, and to build resources that can (and are already) benefiting microbiome researchers.

Phase I of the project was already completed (Zhu et al., 2019). We have released a reference tree, built on 10,575 bacterial and archaeal genomes, based on 381 marker genes.

The project is detailed at our website: https://biocore.github.io/wol/, including data and metadata, code, protocols, a gallery and a visualizer. Large data files are hosted at our Globus endpoint: WebOfLife (see instruction).

This public resource provides everything one needs to start microbiome data analysis using WoL, including raw sequence data, metadata, tree and taxonomy, and pre-built databases that are ready to be plugged into your bioinformatics protocols. Currently, we provide databases for QIIME 2, SHOGUN, Bowtie2, Centrifuge, Kraken2 / Bracken, BLASTn and BLASTp, Minimap2, and DIAMOND. Even if your favorate tool is not on this list, we provide detailed tutorials on how to build your own database and many other related protocols. Meanwhile, WoL is also hosted at our web-based microbiome study platform: Qiita (https://qiita.ucsd.edu/) (see details).

The following tutorial assume that you have downloaded the entire WoL directory from our Globus server. The paths mentioned below are relative to this directory.

Sequence alignment

First, you need to align your sequences (namely your FastQ / Fast5 / BAM files) against the WoL database using an aligner of your choice. Let's take Bowtie2 for example. Our bioinformatics tool, SHOGUN, provides a Bowtie2 wrapper optimized for shotgun metagenomic datasets:

shogun align -d databases/shogun -a bowtie2 -t 16 -p 0.95 -i input.fa -o .

This will generate a SAM format alignment file.

The alignment step has been automated in Qiita. If you use Qiita, the SAM file is ready for download.

[Note] You can also run Bowtie2 manually using your choice of parameters, or using other aligners and other databases. Woltka is designed for flexibility.

gOTU analysis

woltka classify -i input.sam -o output.biom

Note that you can compress the SAM file to save disk space, and Woltka can parse compressed files.

Free-rank classification

Use the original NCBI taxonomy:

woltka classify \
  --input input.sam \
  --map taxonomy/taxid.map \
  --nodes taxonomy/nodes.dmp \
  --names taxonomy/names.dmp \
  --output output.biom

Use lineage strings extracted from NCBI (will lose some resolution, but results are more structured, especially for users familiar with QIIME 2):

woltka classify \
  --input input.sam \
  --lineage taxonomy/lineage.txt \
  --output output.biom

We also provide original and curated NCBI and GTDB taxonomy for choice.

Classification at specific ranks

Slightly modify the command, adding desired ranks:

woltka classify \
  --input input.sam \
  --map taxonomy/taxid.map \
  --nodes taxonomy/nodes.dmp \
  --names taxonomy/names.dmp \
  --rank phylum,genus,species \
  --output output.biom

Coordinates-based functional classification using MetaCyc

mcdir=annotation/metacyc
woltka classify \
  --input input.sam \
  --coords annotation/coords.txt.xz \
  --map annotation/uniref.map.xz \
  --map $mcdir/protein.map --names $mcdir/protein.names \
  --map $mcdir/protein2enzrxn.map --names $mcdir/enzrxn.names \
  --map $mcdir/enzrxn2reaction.map --names $mcdir/reaction.names \
  --map $mcdir/reaction2pathway.map --names $mcdir/pathway.names \
  --map $mcdir/pathway2class.map --names $mcdir/class.names \
  --map-as-rank \
  --rank protein,enzrxn,reaction,pathway,class \
  --output output_dir

Stratified taxonomic / functional classification

Say, you want to stratify functional annotations by genus (taxonomy). First, run taxonomic classification at the genus level, and export read-to-genus maps:

woltka classify \
  --input input.sam \
  ...
  --rank genus \
  --name-as-id \
  --output genus.biom
  --outmap map_dir

Second, run functional annotation, adding the read-to-genus maps for stratification:

woltka classify \
  --input input.sam \
  --coords annotation/coords.txt.xz \
  ...
  --stratify map_dir
  --output output_dir

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wol.md

wol.md

Working with "Web of Life" (WoL)

Sequence alignment

gOTU analysis

Free-rank classification

Classification at specific ranks

Coordinates-based functional classification using MetaCyc

Stratified taxonomic / functional classification

Files

wol.md

Latest commit

History

wol.md

File metadata and controls

Working with "Web of Life" (WoL)

Sequence alignment

gOTU analysis

Free-rank classification

Classification at specific ranks

Coordinates-based functional classification using MetaCyc

Stratified taxonomic / functional classification