Skip to content

Commit

Permalink
Check in merging of nextflow_branch to main branch
Browse files Browse the repository at this point in the history
  • Loading branch information
paulcao-brown committed Oct 18, 2023
1 parent a44f74f commit 0632d6c
Show file tree
Hide file tree
Showing 54 changed files with 9,055 additions and 0 deletions.
33 changes: 33 additions & 0 deletions 2_metadata/GISAID_Download_Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
FROM staphb/pangolin:4.2-pdata-1.18

#Install software
RUN pip install snakemake
RUN apt-get update && \
apt-get -y --no-install-recommends install --fix-missing \
curl \
nano \
vim \
git \
unzip \
r-base \
build-essential \
libssl-dev \
libcurl4-openssl-dev \
libxml2-dev
RUN cd /usr/local/bin/ && curl -fsSL "https://github.com/nextstrain/nextclade/releases/latest/download/nextclade-x86_64-unknown-linux-gnu" -o "./nextclade" && chmod +x ./nextclade
RUN cd /usr/local/bin/ && curl -fsSL "https://github.com/nextstrain/nextclade/releases/latest/download/nextalign-x86_64-unknown-linux-gnu" -o "./nextalign" && chmod +x ./nextalign
RUN apt-get -y install libssl-dev zlib1g-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev \
libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev
RUN R -e "install.packages(c('devtools', 'httr', 'XML', 'gsubfn'), dependencies=TRUE, repos='http://cran.rstudio.com/')"
RUn R -e "devtools::install_github('Wytamma/GISAIDR')"
RUN R -e "install.packages(c('lubridate', 'dplyr'))"
RUN R -e "install.packages(c('tidyr'))"
RUN wget --output-document sratoolkit.tar.gz https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz
RUN tar -vxzf sratoolkit.tar.gz
RUN rm sratoolkit.tar.gz
RUN ln -s /data/sratoolkit.3.0.1-ubuntu64/bin/fastq-dump /usr/bin/fastq-dump
RUN echo "version 4"
RUN apt-get -y install libblas-dev libgfortran-10-dev liblapack-dev
RUN R -e "install.packages(c('seqinr', 'stringr', 'collections'), dependencies=TRUE, repos='http://cran.rstudio.com/')"
RUN echo "avoid cache"
COPY gisaid_download.R /data/gisaid_download.R
28 changes: 28 additions & 0 deletions 2_metadata/documentation/docs/assets/images/cbc.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions 2_metadata/documentation/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Running Workflow on OSCAR

The following documentation details on how to run the Covid19 analysis pipeline specifically on Brown's Oscar cluster.

## Directory Structure

* **0_data:** is an empty directory in which to download sequneces and metadata from GISAID for analyses.
* **1_scripts:** contains shell scripts to run the pipeline as reflected in ```/covid19_analysis/1_scripts``` the singularity image can be pulled directly to oscar or your local machine using ```singularity pull covid19.sif docker://ericsalomaki/covid_new_pango:05092023``` from the `1_scripts` directory.
* **2_metadata:** contains the ```Dockerfile``` that was used to create the container for running the pipeline, a GFF file, QC rules file, and the reference fasta file and genbank file.
* **3_results** will be created while the pipeline is running and results will be written to ```/covid19_analysis/3_results/${YYYYMMDD}```


## Running Pipeline via Oscar Slurm Batch Submission

To run the covid pipeline, navigate to ```/PATH/TO/CLONED/REPO/covid19_analysis/1_scripts/``` and run:
```
sbatch run_slurm.sh /ABSOLUTE/PATH/TO/SEQUENCE/DATA/covid_sequences.fasta
```
Results will be produced in ```/covid19_analysis/3_results/${YYYYMMDD}```

A run with ~20,000 input sequences takes roughly 30 minutes to complete the primary pangolin analyses and produce figures on Oscar with 24 threads and 128G RAM allocated, however the IQ-tree analysis will run for several days. If incomplete, IQ-tree uses checkpoints and therefore the analysis can be continued beyond the allocated time, if necessary.


## Running Pipeline via Oscar Interactive Session

To run thie pipeline in an interact session, first enter a screen `screen -S JOBNAME` and then initiate an interact session with enough resources (`interact -t 24:00:00 -n 24 -m 128G`)

Navigate to the `1_scripts` directory:
```
cd /PATH/TO/CLONED/REPO/covid19_analysis/1_scripts
```

Enter the singularity container and mount the parent directory:

```
singularity exec -B /ABSOLUTE/PATH/TO/CLONED/REPO/covid19_analysis/ /PATH/TO/CLONED/REPO/covid19_analysis/1_scripts/covid19.sif bash
```

Once inside the container, run:

```
bash run.sh /ABSOLUTE/PATH/TO/SEQUENCE/DATA/covid_sequences.fasta
```

To leave the screen use `ctl + a + d` and to return use `screen -r JOBNAME`

Results will be produced in `/PATH/TO/CLONED/REPO/covid19_analysis/3_results/${YYYYMMDD}`

## Example Usage for Oscar
```
sbatch /PATH/TO/CLONED/REPO/covid19_analysis/1_scripts/run_slurm.sh /PATH/TO/CLONED/REPO/covid19_analysis/0_data/sequenceData.fasta
```
61 changes: 61 additions & 0 deletions 2_metadata/documentation/docs/workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Running Workflow via Nextflow

The following documentation details on how to run the Covid19 analysis pipeline using Nextflow on any computing environment.

## Installation

### 1. Check out Github repo
First, check out the Github repo:

```commandline
git clone https://github.com/compbiocore/covid19_analysis.git
```

### 2. Install Nextflow and Singularity

#### Option A: On Any Computing Environment

If you do not have Singularity already; you can install it by referring to the [Singularity installation guide](https://docs.sylabs.io/guides/3.0/user-guide/installation.html) here.

If you do not have Nextflow already; you can install it by referring to the [Nextflow installation guide](https://www.nextflow.io/docs/latest/getstarted.html#installation) here.

After installing Singularity, ensure that in your Nextflow configuration file, you have enabled Singularity in Nextflow. You can refer to the [Singularity configuration guide](https://www.nextflow.io/docs/edge/container.html#id24) here; or in another words, add the following block in the `nextflow.config` file that Nextflow is sourcing:
```commandline
...
singularity {
enabled = true
}
```

#### Option B: On Brown OSCAR Computing Environment

If you are on Brown OSCAR computing environment, you can simply install Nextflow and Singularity computing environment by following the [set up instructions here](https://github.com/compbiocore/workflows_on_OSCAR). And then to initialize the Nextflow environment, simply type in:
```commandline
nextflow_start
```


## Running the Nextflow Workflow

Once you have finished installing (or already have the requisites satisfied), you can run the Nextflow pipeline with the following command:

```
cd $PROJECT_REPO
nextflow run $PROJECT_REPO/workflows/covid19.nf \
--output_dir $OUTPUT_DIR --username $GISAID_USER --password='$GISAID_PASSWORD' \
--project_github $PROJECT_REPO
```

## Output Directory

Below is a brief walk-through and explaination of all the workflow workproducts:

#### Output 1: GISAID Sequence Files and Metadata

In `$OUTPUT_DIR/gisaid`:
- `gisaid.fasta`, the sequence containing for all sequences downloaded from GISAID given a certain geolocation (e.g., USA/Rhode Island).
- `gisaid.csv`, the GISAID metadata file for all the sequences given the certain geolocation
- `sra_run.txt`, all of the SRA id's linked to the GISAID sequences in this workflow.

#### Output 2: Analysis Files

15 changes: 15 additions & 0 deletions 2_metadata/documentation/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
site_name: Computational Biology Core - Brown University
site_author: Paul Cao and Eric Salomaki
repo_url: https://github.com/compbiocore/covid19_analysis
site_description: Documentation for running Covid19 Analysis Workflow
site_url: https://compbiocore.github.io/covid19_analysis
google_analytics: ['UA-115983496-2', 'compbiocore.github.io']

theme:
name: material
feature:
tabs: true
palette:
primary: 'blue grey'
accent: 'indigo'
logo: assets/images/cbc.svg
Loading

0 comments on commit 0632d6c

Please sign in to comment.