-
Notifications
You must be signed in to change notification settings - Fork 34
Managing software
All software packages required by the workflow can be installed in conda
environments, using the provided environment files (envs/environment.yaml and envs/environment_R.yaml).
The environment files contain version numbers of all the required software, reflecting the combination of software versions with which the workflow has been tested. In order to install and use a newer version of any of the software packages, modify the version numbers in the environment file accordingly.
The diagram above indicates the three different ways in which you can manage the software for the ARMOR workflow, i.e. using conda, using a manually installed conda environment and a system R installation, and manually installing all required software. If you want to use a system R installation, you will have to adjust some parameters in the config.yaml
and the .Renviron
file (see "Using a system R installation" below).
You can manage the software in the following 3 different ways:
First, ensure that conda
is available and, if necessary, add the channels r
, conda-forge
and bioconda
(see e.g. here).
Also make sure Snakemake is installed. You can use conda install -c bioconda -c conda-forge snakemake
or do a global installation as described here.
We recommend to install all software including R and the required R packages with conda
.
The setup
Snakemake rule can be used to create the required conda
environments and install the necessary R packages:
snakemake --use-conda setup --cores 1
You can use your system R installation (see below) and still manage all other software with conda. You have to configure your system R as described below ("Using a system R installation").
Then, you can create the conda
environment and install the required software and R packages with
snakemake --use-conda setup --cores 1
You can use your own R installation (see below) in combination with a manually created conda environment to run all the pre-processing steps. This might be useful if you are planning to analyze multiple datasets, because the installation of the required R packages in the R conda
environment takes a long time. You can manually create a conda environment with
conda env create --name ARMOR --file envs/environment.yaml
And activate it with
conda activate ARMOR
, or source activate ARMOR
for conda versions under 4.6.
After configuring your system R as described below (needed for this run-mode), from within the environment (ARMOR
) install the required R packages with
snakemake setup --cores 1
Make sure you are in the environment with conda info --envs
. To exit the environment use conda deactivate
.
For more details on managing environments see here.
In some situations (e.g., Mac OS X), installing R packages inside conda R is difficult. In these cases or if one simply prefers to use a system R installation, the following modifications can be used:
- Change the values of the
useCondaR
variable in theconfig.yaml
toFalse
. - Set the path to the system R binary in the
Rbin
variable inconfig.yaml
. - Set the path to the desired R library in the
.Renviron
file. This can be an existing library directory, or an empty directory. If packages need to be installed, write access to this directory is needed. - In case the library directory defined in the previous step does not exist, create it.
NOTE: You can use your own R with either of the 3 run-modes, but it is necessary for run-modes 1b, 2 and 3.
If you don't want to use conda
, you have to make sure that all necessary software is installed and available in the path. The following software is used by the workflow:
- R
- Salmon
- FastQC
- TrimGalore!
- Cutadapt
- STAR
- samtools
- MultiQC
- bedtools
- bedGraphToBigWig (select your operating system from this page and download the executable)
You also need to make sure that all necessary R packages are installed. The workflow uses the following packages:
- Biostrings
- tximport
- tximeta
- limma
- edgeR
- reshape2
- tibble
- dplyr
- ggplot2
- tidyr
- rtracklayer
- msigdbr
- SummarizedExperiment
- SingleCellExperiment
- rmarkdown
- DRIMSeq (optional)
This list of software and packages can also be found at envs/environment.yaml and scripts/install_pkgs.R, respectively.
This diagram shows the three different run-modes explained above. In the "Requirements" boxes on the left, the required files and software are highlighted in yellow. Specific parameters that need to be adjusted are listed with each file. The green highlighted parts are the path to your system R binary and the desired R library. These paths are specific for your setup and need to be filled out accordingly! The boxes in the middle column list the code that needs to be run in order to install the software and to run the pipeline.
The workflow contains two rules to check the versions of the software that has been used. To check the versions of R packages, running snakemake --use-conda --cores 1 listpackages
(or snakemake --cores 1 listpackages
if you are managing the software separately) will parse the output files generated by R CMD BATCH
and extract all used R packages. The results will be written to a text file. To check the versions of other software, running snakemake --use-conda --cores 1 softwareversions
(or snakemake --cores 1 softwareversions
) will check the versions of the software. Finally, the log
subdirectory of the specified output directory contains log files, which should state the version of all software that was used.
ARMOR has been tested on macOS and Linux systems.