Skip to content

Commit

Permalink
Workflow docs (#180)
Browse files Browse the repository at this point in the history
* Update doc requirements

* Add sphinx pdf embed

* Update meta-recipe info with `--dir-path` parameter from predict-path

* Add workflow docs
  • Loading branch information
mikecormier authored Jan 4, 2021
1 parent a3c9115 commit 76b329c
Show file tree
Hide file tree
Showing 13 changed files with 1,806 additions and 28 deletions.
File renamed without changes.
1 change: 1 addition & 0 deletions docs/doc_pip_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pip install git+git://github.com/SuperKogito/sphinxcontrib-pdfembed
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ def setup(app):
'sphinx.ext.autosummary',
'sphinx.ext.napoleon',
'sphinx_autodoc_typehints', # must be loaded after napoleon
'sphinxcontrib.pdfembed', ## pdf embedder
'celery.contrib.sphinx',
]

Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -452,5 +452,6 @@ Contents:
meta-recipes
contribute
private_recipes
workflows
recipes

4 changes: 3 additions & 1 deletion docs/source/making-meta-recipes.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
.. _contribute-meta-recipe:

Creating a ggd meta-recipe
--------------------------
==========================

[:ref:`Click here to return to the home page <home-page>`]

This page is specific to creating a ggd **meta-recipe**. If you are looking to create a normal ggd recipe see :ref:`Creating a ggd recipe <contrib-recipe>`

Expand Down
4 changes: 2 additions & 2 deletions docs/source/meta-recipes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Most GGD commands work with meta-recipes. These include:
- :code:`ggd show-env` to show the environment variables for ID specific meta-recipe(s)
- :code:`ggd pkg-info` to get the metadata/information of a ID specific meta-recipe
- :code:`ggd get-files` to get the installed data files for an ID specific meta-recipe
- :code:`ggd predict-path` to predict the directory path of an ID specific meta-recipe
- :code:`ggd uninstall` to uninstall an ID specific meta-recipe(s)
- :code:`ggd make-meta-recipe` to make a meta-recipe from a single or group of scripts
- :code:`ggd check-recipe` to test and check a new meta-recipe
Expand All @@ -41,7 +42,6 @@ Most GGD commands work with meta-recipes. These include:
Commands the don't work with meta-recipes include:

- :code:`ggd make-recipe`: This command will only created a GGD recipe not a GGD meta-recipe.
- :code:`ggd predict-path`: Currently, the `predict-path` command does not work with meta-recipes.


Conda Environments and Prefix
Expand Down Expand Up @@ -321,7 +321,7 @@ ID specific meta-recipe data files can be accessed just like any other GGD recip
5) PREFIX
As mentioned above, any GGD command that can use the :code:`--prefix` parameter can be used with meta-recipes. (Excluding the :code:`predict-path` command)
As mentioned above, any GGD command that can use the :code:`--prefix` parameter can be used with meta-recipes.
Therefore, one can install and access the data files of a ID specific meta-recipe in an environment that is different then the active one. Additionally,
a meta-recipes metadata can be accessed in a different environment using :code:`ggd pkg-info` with the :code:`--prefix` parameter.
Expand Down
342 changes: 342 additions & 0 deletions docs/source/nextflow_workflow.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,342 @@
.. _nextflow_workflow:


Nextflow Workflow with GGD
==========================

[:ref:`Click here to return to the home page <home-page>`]

This simple example shows one approach of using GGD for a Nextflow workflow.

In this example, we will use GGD to install a data package prior to running the nextflow workflow and using the data files
during the nextflow workflow process. This is one of many approaches that can be taken to us GGD in a Nextflow workflow

For information on Nextflow workflows see `Nextflow Workflow docs <https://www.nextflow.io/docs/latest/index.html>`_.

**Description:**
In this example we are going to be using the `SeqCover <https://github.com/brentp/seqcover>`_ tool to interactively
evaluate and QC the coverage of a few genes in a few 1000G samples. To do this, we are going to be using the
nextflow workflow `seqcover-nf <https://github.com/brwnj/seqcover-nf>`_ created by Joe Brown.

Nextflow Workflow Files
-----------------------

The files for this Nextflow workflow are provided below for convenience. However, the workflow is available `here <https://github.com/brwnj/seqcover-nf>`_.

**nextflow.config**

- This is the nextflow config file used to set the different configs for running the workflow.

.. code-block:: yaml
// Configurable variables
params {
outdir = './results'
cpus = 4
percentile = 5
}
process {
time = 12.h
memory = 8.GB
cpus = 1
cache = 'lenient'
}
profiles {
docker {
docker.enabled = true
}
singularity {
singularity.runOptions = '--bind /scratch'
singularity.enabled = true
}
none {}
}
process.shell = ['/bin/bash', '-euo', 'pipefail']
timeline {
enabled = true
file = "${params.outdir}/logs/timeline.html"
}
report {
enabled = true
file = "${params.outdir}/logs/report.html"
}
trace {
enabled = true
file = "${params.outdir}/logs/trace.txt"
}
manifest {
name = 'brwnj/seqcover-nf'
author = 'Joe Brown'
description = 'generate depth report per sample per gene'
version = '0.1.0'
nextflowVersion = '>=20.10.0'
homePage = 'https://github.com/brwnj/seqcover-nf'
mainScript = 'main.nf'
}
**main.nf**

- This file is the main nextflow workflow file

- The workflow is as follows:

- Run mosdepth to get per base coverage for each sample

- Create a coverage background for the cohort using seqcover

- Generate the seqcover report of the combined samples


.. code-block:: python
nextflow.enable.dsl=2
params.help = false
if (params.help) {
log.info """
-----------------------------------------------------------------------
seqcover-nf
===========
Documentation and issues can be found at:
https://github.com/brwnj/seqcover-nf
seqcover is available at:
https://github.com/brentp/seqcover
Required arguments:
-------------------
--crams Aligned sequences in .bam and/or .cram format.
Indexes (.bai/.crai) must be present.
--reference Reference FASTA. Index (.fai) must exist in same
directory.
--genes Comma separated list of genes across which to
show coverage, e.g. "PIGA,KCNQ2,ARX,DNM1,SLC25A22,CDKL5".
Options:
--------
--outdir Base results directory for output.
Default: '/.results'
--cpus Number of cpus dedicated to `mosdepth` calls.
Default: 4
--percentile Background percentile used in seqcover report.
More info is available at:
https://github.com/brentp/seqcover#outlier
Default: 5
-----------------------------------------------------------------------
""".stripIndent()
exit 0
}
params.crams = false
params.reference = false
params.outdir = './results'
params.cpus = 4
params.percentile = 5
params.genes = false
params.hg19 = false
if(!params.crams) {
exit 1, "--crams argument like '/path/to/*.cram' is required"
}
if(!params.reference) {
exit 1, "--reference argument is required"
}
if(!params.genes) {
exit 1, "--genes argument, e.g. 'PIGA,KCNQ2,ARX,DNM1', is required"
}
crams = channel.fromPath(params.crams)
crais = crams.map { it -> it + ("${it}".endsWith('.cram') ? '.crai' : '.bai') }
process mosdepth {
container "brwnj/seqcover-nf:v0.1.0"
publishDir "${params.outdir}/mosdepth"
cpus params.cpus
input:
path(cram)
path(crai)
path(reference)
output:
path("*.d4"), emit: d4
script:
"""
mosdepth -f $reference -x -t ${task.cpus} --d4 ${cram.getSimpleName()} $cram
"""
}
process seqcover_background {
container "brwnj/seqcover-nf:v0.1.0"
publishDir params.outdir
input:
path(d4)
path(reference)
val(percentile)
output:
path("seqcover/*.d4"), emit: d4
script:
"""
seqcover generate-background -p 5 -f $reference -o seqcover/ $d4
"""
}
process seqcover_report {
container "brwnj/seqcover-nf:v0.1.0"
publishDir params.outdir
input:
path(d4)
path(background)
path(reference)
val(genes)
val(hg19)
output:
path("*.html"), emit: html
script:
genome_flag = hg19 ? "--hg19" : ""
"""
seqcover report --fasta $reference --background $background --genes $genes $genome_flag $d4
"""
}
workflow {
mosdepth(crams, crais, params.reference)
seqcover_background(mosdepth.output.d4.collect(), params.reference, params.percentile)
seqcover_report(mosdepth.output.d4.collect(), seqcover_background.output.d4, params.reference, params.genes, params.hg19)
}
Steps to run the workflow
--------------------------

1. Grab 1000G bam files.

5 bams and their indexes from 1000G to represent our alignments

.. code-block:: bash
mkdir data && cd data
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00096/alignment/HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00097/alignment/HG00097.chrom20.SOLID.bfast.GBR.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00097/alignment/HG00097.chrom20.SOLID.bfast.GBR.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00182/alignment/HG00182.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00182/alignment/HG00182.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00100/alignment/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00100/alignment/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00183/alignment/HG00183.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00183/alignment/HG00183.chrom20.ILLUMINA.bwa.FIN.low_coverage.20101123.bam.bai
echo done
2. Use GGD to install a reference genome

We see from the header that 1000G uses GRCh37. Using ggd search we can find our reference:

.. code-block:: bash
ggd search -g GRCh37 reference genome
Among the listings, we see the reference we need with install instructions:

.. code-block:: bash
----------------------------------------------------------------------------------------------------
grch37-reference-genome-1000g-v1
================================
Summary: GRCh37 reference genome from 1000 genomes
Species: Homo_sapiens
Genome Build: GRCh37
Keywords: ref, reference, fasta-file
Data Version: phase2_reference
.
.
.
To install run:
ggd install grch37-reference-genome-1000g-v1
----------------------------------------------------------------------------------------------------
We install our reference:

.. code-block:: bash
ggd install grch37-reference-genome-1000g-v1
# activate environmental variables
source activate base
3. Run the Nextflow workflow

Now we have everything to QC our reads using a Nextflow workflow for seqcover.

.. note::

We are using the :code:`$ggd_grch37_reference_genome_1000g_v1_file` environment variable created by GGD when the
data package was installed.

.. code-block:: bash
GENES="MYL9,TLDC2,NNAT,ADIG,FAM83D,PTPRT,SGK2,HNF4A"
nextflow run brwnj/seqcover-nf -revision main -profile docker \
--reference $ggd_grch37_reference_genome_1000g_v1_file \
--crams 'data/*.bam' \
--genes $GENES --hg19
The Nextflow output gives:

.. code-block:: bash
N E X T F L O W ~ version 20.10.0
Launching `brwnj/seqcover-nf` [pedantic_jennings] - revision: 8bba84f42f [main]
executor > local (7)
[bb/2fddf4] process > mosdepth (3) [100%] 5 of 5 ✔
[72/2ea7b3] process > seqcover_background [100%] 1 of 1 ✔
[e5/8d2432] process > seqcover_report [100%] 1 of 1 ✔
Completed at: 25-Nov-2020 16:30:28
Duration : 12m 50s
CPU hours : 0.6
Succeeded : 7
And we have our results in ./results/seqcover_report.html.


Results:
--------

Here is the **seqcover_report.html** output from the above workflow

.. raw:: html

<iframe src="_static/seqcover_report.html" height="1100px" width="100%"></iframe>








Loading

0 comments on commit 76b329c

Please sign in to comment.