DolphinNext is an easy-to-use platform for creating, deploying, and executing complex nextflow pipelines for high throughput data processing.
- A drag and drop user interface to build nextflow pipelines
- Reproducible pipelines with version tracking
- Seamless portability to different computing environments with containerization
- Simplified pipeline sharing using GitHub (github.com)
- Support for continuous integration and tests (travis-ci.org)
- Easy re-execution of pipelines by copying previous runs settings
- Integrated data analysis and reporting interface with R markdown support
- Reusability
- Reproducibility
- Shareability
- Easy execution
- Easy monitoring
- Easy reporting
To understand the basics of DolphinNext, how to use pipeline builder for different objectives and to familiarize yourself with Nextflow and some standard software packages for such analysis.
- Before you start
- Getting Started
- Exercise 1: Creating processes
- Exercise 2: Building a pipeline
- Exercise 3: Running a pipeline
If you prefer, you can click on the video links to follow the tutorial in a video.
Video Summary:
- Exercise 1: Creating processes
- Exercise 2: Building a pipeline at 2:43
- Exercise 3: Running a pipeline at 4:34
DolphinNext can be run standalone using a docker container. First docker image need to be build unless you want to use prebuild from dockerhub. So, any changes in the Dockerfile require to build the image. But in this tutorial, we will pull it from dockerhub and start the container.
- Pull DolphinNext-studio
docker pull ummsbiocore/dolphinnext-studio
- We move database outside of the container to be able to keep the changes in the database every time you start the container. Please choose a directory in your machine to mount. For example, I will use ~/export directory for this purpose.
sudo mkdir -p ~/export
- While running the container;
docker run --privileged -m 10G -p 8080:80 -v ~/export:/export -ti ummsbiocore/dolphinnext-studio /bin/bash
- After you start the container, you need to start the mysql and apache server using the command below;
startup
- Now, you can open your browser to access DolphinNext using the url below.
http://localhost:8080/dolphinnext
This guide will walk you through how to start creating Nextflow processes using DolphinNext and creating new Nextflow pipelines using DolphinNext pipeline builder and executing them.
First, you need to access DolphinNext web page: https://localhost:8080/dolphinnext and click Sign Up or Sign in with Google buttons. You will be asked to enter some information about your institution, username, etc.
Once you login, you will be the administrator of this mirror. You can add more users to your system and manage them from profile/admin section.
Once logged in, click on the pipeline tab in the top left of the screen to access pipeline builder page.
Process is a basic programming element in Nextflow to run user scripts. Please click here to learn more about Nextflow's processes.
A process usually has inputs, outputs and script section. In this tutorial, you will see sections that include necesseary information to define a process shown in the left side of the picture below. Please, use that information to be fill "Add new process" form shown in the middle section the picture below. DolphinNext will then convert this information to a nextflow process shown in the left side of the picture. Once a process created, it can be used in the pipeline builder. The example how it looks is shown in the bottom left side in the picture. The mapping between the sections shown in colored rectangles.
- FastQC process
- Hisat2 process
- RSeQC process
You’ll notice several buttons at the left menu. New processes are created by clicking green “New process” button.
a. First, please click, green “New process” button to open "Add New Process" modal.
b. Please enter FastQC for the process name and define a new "Menu Group".
c. In the FastQC process, we have an input, an output and a line of a command we are going to use to execute the fastqc process. Please click "Add New Process" button and use the information below to fill the "Add New Process" form.
Name: "FastQC"
Menu Group: "Tutorial"
Inputs:
reads(fastq,set) name: val(name),file(reads)
Outputs:
outputFileHTML(html,file) name: "*.html"
Script:
fastqc ${reads}
d. Lets select input and output parameters (reads
and outputFileHTML
) and define their "Input Names" that we are going to use in the script section.
e. Let's enter the script section
f. Press "Save changes" button at the bottom of the modal to create the process. Now this process is ready to use. We will use it in the Exercise 2.
Let's create Hisat2 process.
a. First, please click, green “New process” button to open "Add New Process" modal.
b. Inputs, outputs and scripts should be defined like below;
Name: "Hisat2"
Menu Group: "Tutorial"
Inputs:
reads(fastq,set) name: val(name),file(reads)
hisat2IndexPrefix(val) name: hisat2Index
Outputs:
mapped_reads(bam,set) name: val(name), file("${name}.bam")
outputFileTxt(txt,file) name: "${name}.align_summary.txt"
Script:
hisat2 -x ${hisat2Index} -U ${reads} -S ${name}.sam &> ${name}.align_summary.txt
samtools view -bS ${name}.sam > ${name}.bam
c. After you select input output parameters (hisat2IndexPrefix
, mapped_reads
and outputFileTxt
parameters), add their names and enter the script. The page should look like this;
d. Please save changes before you close the screen.
a. First, please click, green “New process” button to open "Add New Process" modal.
b. The form should be filled using the information below;
Name: "RSeQC"
Menu Group: "Tutorial"
Inputs:
mapped_reads(bam,set) name: val(name), file(bam)
bedFile(bed,file) name: bed
Outputs:
outputFileTxt(txt,file) name: "RSeQC.${name}.txt"
Script:
read_distribution.py -i ${bam} -r ${bed}> RSeQC.${name}.txt
c. After you select input output parameters, enter their names and the script. The page should look like this;
d. Please, save changes before you close the screen.
Here Exercise 1 is finished. Please move to Exercise 2 to build the pipeline using the processes you created in Exercise 1.
Once logged in, click to the pipeline button in the top left of the screen. You’ll notice Enter Pipeline Name box, just below the Pipelines button.
Note*: If you could not finish the Exercise 1. Please "import" the RNA-Seq-Tutorial.dn file in github using your pipeline builder and the processes defined in Exercise 1 will be in your left menu that you can use them while doing Exercise 2.
Before you start building the pipeline make sure you have the processes available in your menu.
a. Please enter a name to your pipeline. E.g. "RNA-Seq-Tutorial" and select your menu group "public pipelines" and press save button.
b. Please drag and drop FastQC, Hisat2 and RSeQC to your workspace;
c. Please drag and drop three Input parameters and change their names to "Input_Reads", "Hisat2_Index" and "bedFile" and connect them to their processes;
d. Connect your Hisat2 process with RSeQC process using mapped_reads parameter in both. You will observe that, when the types match you can connect the two processes using their matching input and output parameters.
e. Drag & Drop three "output parameters" from the side bar and name them "FastQC_output", "Hisat2_Summary", and "RSeQC_output" and connect them to their corresponding processes. While naming, click their "Publish to Web Directory" and choose the right output format according to the output type of the process.
f. Overall pipeline should look like below;
1. Once a pipeline is created, you will notice “Run” button at the right top of the page.
2. This button opens a new window where you can create a new project by clicking “Create a Project” button. After entering and saving the name of the project, it will be added to your project list.
3. Now you can select your project by clicking on the project. You will then proceed by entering run name which will be added to your run list of the project. Clicking “Save run” will redirect you to the “run page” where you can initiate your run.
4. Here, please enter your working directory, choose your "Run Environment", click "Use Singularity Image" and enter the values below;
Work Directory: /export/tests/test1 Run Environment: Local Image Path: dolphinnext/rnaseq:1.0 Run Options: --bind /export --bind /data Inputs: - bedFile: /data/genome_data/mousetest/mm10/refseq_170804/genes/genes.bed (Use Manually tab) - Hisat2_Index: /data/genome_data/mousetest/mm10/refseq_170804/Hisat2Index/genome (Use Manually tab) - Input_Reads: First go to Files Tab in "Select/Add Input File" modal and click "Add File" button. Then enter "File Directory (Full Path)" as: /data/fastq_data/single and follow Creating Collection section.
5. Now we are ready to enter inputs we defined for the pipeline. Please choose "Manually" tab. First enter the location of the bed file. bedFile:
/data/genome_data/mousetest/mm10/refseq_170804/genes/genes.bed
6. Second enter the prefix for the hisat2 index files. Please choose "Manually" tab. Hisat2_Index:
/data/genome_data/mousetest/mm10/refseq_170804/Hisat2Index/genome
7. Now, we are ready to add files; First go to Files Tab in "Select/Add Input File" modal and click "Add File" button
8. Enter the full path of the location of your files. Please choose "Files tab" and click "Add File" button. Here for the test case we will use the path below; File Directory (Full Path):
/data/fastq_data/single
And choose "Single List" for the "Collection Type" and press add all files button.
9. Here there is an option to change the names but we will keep them as they are and enter a collection name and "save files".
collection name: test collection
10. In the next screen, the user can still add or remove some samples. Let's click "Save file" button to process all samples.
11. After we fill the inputs it should look like below and the orange "Waiting" button at the top right should turn to green "Ready to Run" button;
12. Press that "Ready to Run" button.
13. All run should finish in a couple of minutes. When the run finalized the log section will be look like below;
a. Logs:
14. In the report section, you can monitor all defined reports in the pipeline;
15. As you can tell from Timeline report, it used only a cpu and it didn't parallelize the run. To be able to start parallel runs, the profile for the environment should be changed.
With this change there will be 3 parallel jobs.