-
Notifications
You must be signed in to change notification settings - Fork 34
Running the analysis
It might be useful to have a look at what will happen if you start the ARMOR workflow with your current setup. To do this, run snakemake --use-conda -npr
or snakemake -npr
if you do not want to use conda. The -n
parameter causes a dryrun, i.e. no execution and just displaying what will be done. The -p
parameter prints the shell commands that will be executed by the pipeline; good for checking if file paths are correct. The -r
parameter prints the reason why each rule will be executed (e.g. missing output file, new timestamp,...).
If all the paths and individual configurations are defined in the config.yaml
file (see configuration) and conda
is available (see managing software), the workflow can be run from the command line with
snakemake --use-conda --cores 1
Snakemake will create a conda
environment from the envs/environment.yaml
file and it will activate the environment before executing all rules with the conda
directive. If you are an experienced conda
user, you can specify a different environment for each rule within the Snakefile
, and the correct environment will be activated.
If you want to run a specific rule, just do
snakemake --use-conda --cores 1 <ruleName>
If you want to use multiple cores, use
snakemake --use-conda --cores 12
This sets the maximum number of CPU cores used simultaneously when running the workflow. By default, it uses only 1 CPU core.
Note that setting --cores
when calling snakemake
is different from setting the ncores
parameter in the config.yaml
file. The latter sets the number of cores used by a single job. If this number is lower than the number provided to --cores
, several jobs can be executed in parallel. For example, if we set ncores: 2
in the configuration file, and run the workflow with --cores 12
, Snakemake will run 6 jobs in parallel for the multi-threaded rules (FastQC, STAR, Salmon and DRIMSeq), with each job using 2 cores (thanks @matrs for suggesting this clarification).
You can look at an example here for more details.
First make sure the paths to all your input files are specified correctly in the config.yaml
(see here). Relative paths will be interpreted relative to the Snakefile
directory! To run the workflow outside of the folder containing the Snakefile
(and all the scripts), specify the Snakefile
path, and the path to the folder containing this file
snakemake --use-conda --cores 1 -s <path-to-Snakefile> -d <workdir>
Where workdir
is the directory of the Snakefile
.
If you want to use a config.yaml
file that is not located in the Snakefile
directory, you can specify it with the --configfile
parameter. Run the workflow from the Snakefile
directory with
snakemake --use-conda --cores 1 --configfile <path-to-config.yaml>
Or see above for how to run the workflow from an arbitrary directory.
After setting up your conda
environment and system R installation (read first here), activate the environment and from within the environment run the pipeline with snakemake
. For multiple cores use snakemake --cores 12
.
In case you have all the necessary software in your path (see here) and you don't want to use conda
, simply run the workflow without the --use-conda
parameter:
snakemake --cores 1
.
Summary: If you do not want to use conda to manage your software (i.e. run-mode 2 and 3 of Managing software, simply omit the --use-conda
parameter from the example commands.
If invoked as described above, snakemake
will execute a rule if the output is out-of-date with respect to the input, as determined by the time stamps of the corresponding files. In order to force re-execution in cases where the parameters (defined in the config.yaml
file) have changed, call snakemake
as:
snakemake --use-conda --cores 1 -R `snakemake --list-params-changes`
--list-params-changes
will list the files that use any of the updated parameters, and -R
will force their regeneration. See here for more details.
However, if you want to re-execute the entire pipeline with an updated version of ARMOR and/or conda, we recommend starting a new analysis. Simply put, do not update ARMOR in the middle of an analysis.
Use
snakemake --cores 1 -D > summary.txt
to generate a detailed summary of your workflow's output files after the run is finished (or to see the status of the output files at ay time), without re-running the workflow. As explained in the snakemake manual, the -D
(or --detailed-summary
) flag prints a summary of all files created by the workflow. It has the following columns: filename, modification time, rule version, input file(s), shell command, status, plan. One useful aspect of this is that you can easily retrieve the shell command that was used to generate each output file.
You can also use
snakemake --cores 1 --report report.html
to generate a nice visual report of run times and statistics of your workflow run.