Skip to content

Commit

Permalink
Merge pull request #181 from kbseah/master
Browse files Browse the repository at this point in the history
Update documentation for installation and database setup
  • Loading branch information
HRGV authored May 8, 2023
2 parents 889ebc1 + 927cc2d commit 2518a8c
Show file tree
Hide file tree
Showing 7 changed files with 239 additions and 215 deletions.
97 changes: 61 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<img align="right" src="docs/phyloFlash_logo.png" width="200" alt="phyloFlash logo"/>

# phyloFlash v3.4
# phyloFlash

[![GitHub (pre-)release](https://img.shields.io/github/release/HRGV/phyloflash/all.svg?label=Latest%20Version)]()
[![Bioconda](https://img.shields.io/conda/vn/Bioconda/phyloFlash.svg)](https://bioconda.github.io/recipes/phyloflash/README.html)
Expand All @@ -12,34 +12,31 @@ by Harald Gruber-Vodicka, Elmar A. Pruesse, and Brandon Seah.
phylogenetic composition of an Illumina (meta)genomic or transcriptomic
dataset. **[Manual](https://hrgv.github.io/phyloFlash)**

***NOTE*** Version 3 changed some input options and also how mapping-based taxa
(NTUs) are handled. Please download the last release of v2.0 ([tar.gz
archive](https://github.com/HRGV/phyloFlash/archive/v2.0-beta6.tar.gz)) for the
old implementation. No changes have been made to the database setup, so
databases prepared for v2.0 can still be used for v3.0.

Read [our paper](https://doi.org/10.1128/mSystems.00920-20) on phyloFlash.


## Quick-start

### Download via Conda

We recommend installing phyloFlash and its dependencies using Conda or Mamba.
[Conda](https://conda.io/docs/) is a package manager that will also install
dependencies that are required if you don't have them already.

phyloFlash is distributed through the [Bioconda](http://bioconda.github.io/)
channel on Conda.

According to the [Conda documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html),
it is recommended to install all packages at the same time to avoid dependency
conflicts, and to create new environments instead of installing to the base
environment.
avoid installing new packages to your base environment but create new
environments for them as required. Also, specify all desired packages at the
same time when creating a new environment, instead of adding them sequentially,
to avoid dependency conflicts.

We also recommend using [Mamba](https://mamba.readthedocs.io/en/latest/) as a
We also suggest using [Mamba](https://mamba.readthedocs.io/en/latest/) as a
drop-in substitute for Conda. It implements a more effective dependency solver
and is also the default Conda frontend for the pipeline managers Snakemake.
Conda sometimes fails to solve the environment, and in these cases Mamba
usually works.
and is also the default Conda frontend for the pipeline manager Snakemake.
Simply replace `conda` with `mamba` in the commands below. Note that the
`defaults` channel should be enabled.

```bash
# If you haven't set up Bioconda already
Expand All @@ -49,39 +46,58 @@ conda config --add channels conda-forge
# Try the following step if "solving environment" does not terminate
conda config --set channel_priority strict
# Create new environment named "pf" with phyloflash
# Sortmerna is an optional dependency
conda create -n pf phyloflash sortmerna=2.1b
# If Conda is unable to solve the environment; requires mamba in base env
mamba create -n pf phyloflash sortmerna=2.1b
conda create -n pf phyloflash
# Activate environment
conda activate pf
# Check that dependencies all installed properly
phyloFlash.pl -check_env
```

### Download from GitHub

If you prefer not to use Conda, or are interested in a specific version that is
not distributed there, you can download releases from the
[releases](https://github.com/HRGV/phyloFlash/releases) page on GitHub.
### Download pre-formatted database

If you clone the repository directly off GitHub you might end up with a version
that is still under development.
Pre-formatted databases derived from SILVA releases 138 onwards are available
from the following Zenodo archives:

```bash
# Download latest release
wget https://github.com/HRGV/phyloFlash/archive/pf3.4.tar.gz
tar -xzf pf3.4.tar.gz
* [SILVA 138.1](https://doi.org/10.5281/zenodo.7892521) (latest)
* [SILVA 138](https://doi.org/10.5281/zenodo.7890453)

# Check for dependencies and install them if necessary
cd phyloFlash-pf3.4
./phyloFlash.pl -check_env
Download, checksum, and unpack:

```bash
wget https://zenodo.org/record/7892522/files/138.1.tar.gz # 5.5 GB download
tar -xzf 138.1.tar.gz # unpacks folder 138.1/ in the current location
```

### Set up database and run
Specify path to the database folder with the option `-dbhome` when running
phyloFlash (see below).

This assumes that the phyloFlash scripts are already in your path.
Older versions of the SILVA database have a more restrictive license, so we are
unable to distribute pre-formatted versions. You will have to download the
original SILVA files and run the `phyloFlash_makedb.pl` script yourself (see
Manual).


### Test phyloFlash with test dataset

Test data are included with phyloFlash. The following assumes that you
installed phyloFlash to a Conda environment called `pf`, and that the database
files have been unpacked to a folder `/path/to/138.1`. By default, phyloFlash
will look for the database folder in the folder where it is installed. If it is
located somewhere else, specify this to the `-dbhome` option.

```bash
# Install reference database (takes some time)
phyloFlash_makedb.pl --remote
conda activate pf # If Conda environment not already activated
phyloFlash.pl -dbhome /path/to/138.1 -lib TEST -CPUs 16 \
-read1 ${CONDA_PREFIX}/lib/phyloFlash/test_files/test_F.fq.gz \
-read2 ${CONDA_PREFIX}/lib/phyloFlash/test_files/test_R.fq.gz \
-almosteverything
```


### Example phyloFlash commands

```bash
# Run with test data and 16 processors (default is to use all processors available)
phyloFlash.pl -lib TEST -CPUs 16 -read1 test_files/test_F.fq.gz -read2 test_files/test_R.fq.gz

Expand Down Expand Up @@ -116,6 +132,7 @@ read sets.
Use the `-zip` switch to compress output files into tar.gz archive, and `-log`
to save run messages to a log file


## Output

phyloFlash screens metagenomic or metatranscriptomic reads for SSU rRNA
Expand All @@ -133,6 +150,7 @@ Plain text and HTML-formatted reports are produced, reporting summary
statistics from each run. The HTML report includes an interactive graphical
summary.


## Going further

The phyloFlash suite also includes other tools for SSU rRNA-centric metagenome
Expand All @@ -149,9 +167,13 @@ analyses. Run the commands without arguments to see help messages.
and extract contigs connected to them. Optionally compare to phyloFlash
results from the same library.


## Manual

For further information **please refer to the [Manual](https://hrgv.github.io/phyloFlash)**.
For further information please refer to the
[Manual](https://hrgv.github.io/phyloFlash) as well as the command-line help
page `phyloFlash.pl -man`.


## Versions and changes

Expand Down Expand Up @@ -199,6 +221,7 @@ For further information **please refer to the [Manual](https://hrgv.github.io/ph
* No change to heatmap script for comparing multiple samples
* v2.0 complete rewrite


## Contact

Please report any problems to the [phyloFlash Google
Expand All @@ -210,12 +233,14 @@ issue tracker.
We also welcome any feedback on the software and its documentation, especially
suggestions for improvement!


## Acknowledgements

We thank colleagues and phyloFlash users who have contributed to phyloFlash
development by testing the software, reporting bugs, and suggesting new
features.


## Citation

If you use phyloFlash for a publication, please cite our paper in _mSystems_:
Expand Down
93 changes: 54 additions & 39 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,6 @@ layout: home
phylogenetic composition of an Illumina (meta)genomic or transcriptomic
dataset.

***NOTE*** Version 3 changes some input options and also how mapping-based taxa
(NTUs) are handled. Please download the last release of v2.0 ([tar.gz
archive](https://github.com/HRGV/phyloFlash/archive/v2.0-beta6.tar.gz)) for the
old implementation. No changes have been made to the database setup, so
databases prepared for v2.0 can still be used for v3.0.

This manual explains how to install and use phyloFlash. Navigate from the menu
bar above or the table of contents below.

Expand All @@ -33,22 +27,24 @@ You may read more about the pipeline design and application in our

### Download via Conda

We recommend installing phyloFlash and its dependencies using Conda or Mamba.
[Conda](https://conda.io/docs/) is a package manager that will also install
dependencies that are required if you don't have them already. phyloFlash is
distributed through the [Bioconda](http://bioconda.github.io/) channel on
Conda.
dependencies that are required if you don't have them already.

phyloFlash is distributed through the [Bioconda](http://bioconda.github.io/)
channel on Conda.

According to the [Conda
documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html),
it is recommended to install all packages at the same time to avoid dependency
conflicts, and to create new environments instead of installing to the base
environment.
According to the [Conda documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html),
avoid installing new packages to your base environment but create new
environments for them as required. Also, specify all desired packages at the
same time when creating a new environment, instead of adding them sequentially,
to avoid dependency conflicts.

We also recommend using [Mamba](https://mamba.readthedocs.io/en/latest/) as a
We also suggest using [Mamba](https://mamba.readthedocs.io/en/latest/) as a
drop-in substitute for Conda. It implements a more effective dependency solver
and is also the default Conda frontend for the pipeline managers Snakemake.
Conda sometimes fails to solve the environment, and in these cases Mamba
usually works.
and is also the default Conda frontend for the pipeline manager Snakemake.
Simply replace `conda` with `mamba` in the commands below. Note that the
`defaults` channel should be enabled.

```bash
# If you haven't set up Bioconda already
Expand All @@ -58,39 +54,58 @@ conda config --add channels conda-forge
# Try the following step if "solving environment" does not terminate
conda config --set channel_priority strict
# Create new environment named "pf" with phyloflash
# sortmerna is an optional dependency
conda create -n pf phyloflash sortmerna=2.1b
# If Conda is unable to solve the environment; requires mamba in base env
mamba create -n pf phyloflash sortmerna=2.1b
conda create -n pf phyloflash
# Activate environment
conda activate pf
# Check that dependencies all installed properly
phyloFlash.pl -check_env
```

### Download from GitHub

If you prefer not to use Conda, or are interested in a specific version that is
not distributed there, you can download releases from the
[releases](https://github.com/HRGV/phyloFlash/releases) page on GitHub.
### Download pre-formatted database

If you clone the repository directly off GitHub you might end up with a version
that is still under development.
Pre-formatted databases derived from SILVA releases 138 onwards are available
from the following Zenodo archives:

```bash
# Download latest release
wget https://github.com/HRGV/phyloFlash/archive/pf3.4.tar.gz
tar -xzf pf3.4.tar.gz
* [SILVA 138.1](https://doi.org/10.5281/zenodo.7892521) (latest)
* [SILVA 138](https://doi.org/10.5281/zenodo.7890453)

# Check for dependencies and install them if necessary
cd phyloFlash-pf3.4
./phyloFlash.pl -check_env
Download, checksum, and unpack:

```bash
wget https://zenodo.org/record/7892522/files/138.1.tar.gz # 5.5 GB download
tar -xzf 138.1.tar.gz # unpacks folder 138.1/ in the current location
```

### Set up database and run
Specify path to the database folder with the option `-dbhome` when running
phyloFlash (see below).

Older versions of the SILVA database have a more restrictive license, so we are
unable to distribute pre-formatted versions. You will have to download the
original SILVA files and run the `phyloFlash_makedb.pl` script yourself (see
Manual).

This assumes that the phyloFlash scripts are already in your path.

### Test phyloFlash with test dataset

Test data are included with phyloFlash. The following assumes that you
installed phyloFlash to a Conda environment called `pf`, and that the database
files have been unpacked to a folder `/path/to/138.1`. By default, phyloFlash
will look for the database folder in the folder where it is installed. If it is
located somewhere else, specify this to the `-dbhome` option.

```bash
# Install reference database (takes some time)
phyloFlash_makedb.pl --remote
conda activate pf # If Conda environment not already activated
phyloFlash.pl -dbhome /path/to/138.1 -lib TEST -CPUs 16 \
-read1 ${CONDA_PREFIX}/lib/phyloFlash/test_files/test_F.fq.gz \
-read2 ${CONDA_PREFIX}/lib/phyloFlash/test_files/test_R.fq.gz \
-almosteverything
```


### Example phyloFlash commands

```bash
# Run with test data and 16 processors (default is to use all processors available)
phyloFlash.pl -lib TEST -CPUs 16 -read1 test_files/test_F.fq.gz -read2 test_files/test_R.fq.gz

Expand Down
Loading

0 comments on commit 2518a8c

Please sign in to comment.