generated from compbiocore/analysis_template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Check in merging of nextflow_branch to main branch
- Loading branch information
1 parent
a44f74f
commit 0632d6c
Showing
54 changed files
with
9,055 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
FROM staphb/pangolin:4.2-pdata-1.18 | ||
|
||
#Install software | ||
RUN pip install snakemake | ||
RUN apt-get update && \ | ||
apt-get -y --no-install-recommends install --fix-missing \ | ||
curl \ | ||
nano \ | ||
vim \ | ||
git \ | ||
unzip \ | ||
r-base \ | ||
build-essential \ | ||
libssl-dev \ | ||
libcurl4-openssl-dev \ | ||
libxml2-dev | ||
RUN cd /usr/local/bin/ && curl -fsSL "https://github.com/nextstrain/nextclade/releases/latest/download/nextclade-x86_64-unknown-linux-gnu" -o "./nextclade" && chmod +x ./nextclade | ||
RUN cd /usr/local/bin/ && curl -fsSL "https://github.com/nextstrain/nextclade/releases/latest/download/nextalign-x86_64-unknown-linux-gnu" -o "./nextalign" && chmod +x ./nextalign | ||
RUN apt-get -y install libssl-dev zlib1g-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev \ | ||
libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev | ||
RUN R -e "install.packages(c('devtools', 'httr', 'XML', 'gsubfn'), dependencies=TRUE, repos='http://cran.rstudio.com/')" | ||
RUn R -e "devtools::install_github('Wytamma/GISAIDR')" | ||
RUN R -e "install.packages(c('lubridate', 'dplyr'))" | ||
RUN R -e "install.packages(c('tidyr'))" | ||
RUN wget --output-document sratoolkit.tar.gz https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz | ||
RUN tar -vxzf sratoolkit.tar.gz | ||
RUN rm sratoolkit.tar.gz | ||
RUN ln -s /data/sratoolkit.3.0.1-ubuntu64/bin/fastq-dump /usr/bin/fastq-dump | ||
RUN echo "version 4" | ||
RUN apt-get -y install libblas-dev libgfortran-10-dev liblapack-dev | ||
RUN R -e "install.packages(c('seqinr', 'stringr', 'collections'), dependencies=TRUE, repos='http://cran.rstudio.com/')" | ||
RUN echo "avoid cache" | ||
COPY gisaid_download.R /data/gisaid_download.R |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Running Workflow on OSCAR | ||
|
||
The following documentation details on how to run the Covid19 analysis pipeline specifically on Brown's Oscar cluster. | ||
|
||
## Directory Structure | ||
|
||
* **0_data:** is an empty directory in which to download sequneces and metadata from GISAID for analyses. | ||
* **1_scripts:** contains shell scripts to run the pipeline as reflected in ```/covid19_analysis/1_scripts``` the singularity image can be pulled directly to oscar or your local machine using ```singularity pull covid19.sif docker://ericsalomaki/covid_new_pango:05092023``` from the `1_scripts` directory. | ||
* **2_metadata:** contains the ```Dockerfile``` that was used to create the container for running the pipeline, a GFF file, QC rules file, and the reference fasta file and genbank file. | ||
* **3_results** will be created while the pipeline is running and results will be written to ```/covid19_analysis/3_results/${YYYYMMDD}``` | ||
|
||
|
||
## Running Pipeline via Oscar Slurm Batch Submission | ||
|
||
To run the covid pipeline, navigate to ```/PATH/TO/CLONED/REPO/covid19_analysis/1_scripts/``` and run: | ||
``` | ||
sbatch run_slurm.sh /ABSOLUTE/PATH/TO/SEQUENCE/DATA/covid_sequences.fasta | ||
``` | ||
Results will be produced in ```/covid19_analysis/3_results/${YYYYMMDD}``` | ||
|
||
A run with ~20,000 input sequences takes roughly 30 minutes to complete the primary pangolin analyses and produce figures on Oscar with 24 threads and 128G RAM allocated, however the IQ-tree analysis will run for several days. If incomplete, IQ-tree uses checkpoints and therefore the analysis can be continued beyond the allocated time, if necessary. | ||
|
||
|
||
## Running Pipeline via Oscar Interactive Session | ||
|
||
To run thie pipeline in an interact session, first enter a screen `screen -S JOBNAME` and then initiate an interact session with enough resources (`interact -t 24:00:00 -n 24 -m 128G`) | ||
|
||
Navigate to the `1_scripts` directory: | ||
``` | ||
cd /PATH/TO/CLONED/REPO/covid19_analysis/1_scripts | ||
``` | ||
|
||
Enter the singularity container and mount the parent directory: | ||
|
||
``` | ||
singularity exec -B /ABSOLUTE/PATH/TO/CLONED/REPO/covid19_analysis/ /PATH/TO/CLONED/REPO/covid19_analysis/1_scripts/covid19.sif bash | ||
``` | ||
|
||
Once inside the container, run: | ||
|
||
``` | ||
bash run.sh /ABSOLUTE/PATH/TO/SEQUENCE/DATA/covid_sequences.fasta | ||
``` | ||
|
||
To leave the screen use `ctl + a + d` and to return use `screen -r JOBNAME` | ||
|
||
Results will be produced in `/PATH/TO/CLONED/REPO/covid19_analysis/3_results/${YYYYMMDD}` | ||
|
||
## Example Usage for Oscar | ||
``` | ||
sbatch /PATH/TO/CLONED/REPO/covid19_analysis/1_scripts/run_slurm.sh /PATH/TO/CLONED/REPO/covid19_analysis/0_data/sequenceData.fasta | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Running Workflow via Nextflow | ||
|
||
The following documentation details on how to run the Covid19 analysis pipeline using Nextflow on any computing environment. | ||
|
||
## Installation | ||
|
||
### 1. Check out Github repo | ||
First, check out the Github repo: | ||
|
||
```commandline | ||
git clone https://github.com/compbiocore/covid19_analysis.git | ||
``` | ||
|
||
### 2. Install Nextflow and Singularity | ||
|
||
#### Option A: On Any Computing Environment | ||
|
||
If you do not have Singularity already; you can install it by referring to the [Singularity installation guide](https://docs.sylabs.io/guides/3.0/user-guide/installation.html) here. | ||
|
||
If you do not have Nextflow already; you can install it by referring to the [Nextflow installation guide](https://www.nextflow.io/docs/latest/getstarted.html#installation) here. | ||
|
||
After installing Singularity, ensure that in your Nextflow configuration file, you have enabled Singularity in Nextflow. You can refer to the [Singularity configuration guide](https://www.nextflow.io/docs/edge/container.html#id24) here; or in another words, add the following block in the `nextflow.config` file that Nextflow is sourcing: | ||
```commandline | ||
... | ||
singularity { | ||
enabled = true | ||
} | ||
``` | ||
|
||
#### Option B: On Brown OSCAR Computing Environment | ||
|
||
If you are on Brown OSCAR computing environment, you can simply install Nextflow and Singularity computing environment by following the [set up instructions here](https://github.com/compbiocore/workflows_on_OSCAR). And then to initialize the Nextflow environment, simply type in: | ||
```commandline | ||
nextflow_start | ||
``` | ||
|
||
|
||
## Running the Nextflow Workflow | ||
|
||
Once you have finished installing (or already have the requisites satisfied), you can run the Nextflow pipeline with the following command: | ||
|
||
``` | ||
cd $PROJECT_REPO | ||
nextflow run $PROJECT_REPO/workflows/covid19.nf \ | ||
--output_dir $OUTPUT_DIR --username $GISAID_USER --password='$GISAID_PASSWORD' \ | ||
--project_github $PROJECT_REPO | ||
``` | ||
|
||
## Output Directory | ||
|
||
Below is a brief walk-through and explaination of all the workflow workproducts: | ||
|
||
#### Output 1: GISAID Sequence Files and Metadata | ||
|
||
In `$OUTPUT_DIR/gisaid`: | ||
- `gisaid.fasta`, the sequence containing for all sequences downloaded from GISAID given a certain geolocation (e.g., USA/Rhode Island). | ||
- `gisaid.csv`, the GISAID metadata file for all the sequences given the certain geolocation | ||
- `sra_run.txt`, all of the SRA id's linked to the GISAID sequences in this workflow. | ||
|
||
#### Output 2: Analysis Files | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
site_name: Computational Biology Core - Brown University | ||
site_author: Paul Cao and Eric Salomaki | ||
repo_url: https://github.com/compbiocore/covid19_analysis | ||
site_description: Documentation for running Covid19 Analysis Workflow | ||
site_url: https://compbiocore.github.io/covid19_analysis | ||
google_analytics: ['UA-115983496-2', 'compbiocore.github.io'] | ||
|
||
theme: | ||
name: material | ||
feature: | ||
tabs: true | ||
palette: | ||
primary: 'blue grey' | ||
accent: 'indigo' | ||
logo: assets/images/cbc.svg |
Oops, something went wrong.