-
Git clone this pipeline and move into it.
$ git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2 $ cd chip-seq-pipeline2
-
Download cromwell.
$ wget https://github.com/broadinstitute/cromwell/releases/download/34/cromwell-34.jar $ chmod +rx cromwell-34.jar
-
Download a SUBSAMPLED (1/400) paired-end sample of ENCSR936XTK.
$ wget https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR936XTK/ENCSR936XTK_fastq_subsampled.tar $ tar xvf ENCSR936XTK_fastq_subsampled.tar
-
Download pre-built genome database for hg38.
$ wget https://storage.googleapis.com/encode-pipeline-genome-data/test_genome_database_hg38_chip.tar $ tar xvf test_genome_database_hg38_chip.tar
-
Set your partition/account in
workflow_opts/slurm.json
. If your SLURM cluster does not require either user's partition or account information, then remove them from this file. Otherwise,YOUR_SLURM_PARTITON
orYOUR_SLURM_ACCOUNT
will be used internally forsrun ... --partition YOUR_SLURM_PARTITON
orsrun ... --account YOUR_SLURM_PARTITON
, respectively.{ "default_runtime_attributes" : { "slurm_partition": "YOUR_SLURM_PARTITON", "slurm_account": "YOUR_SLURM_ACCOUNT" } }
Our pipeline supports both Conda and Singularity.
-
Install Conda dependencies.
$ bash conda/uninstall_dependencies.sh # to remove any existing pipeline env $ bash conda/install_dependencies.sh
-
Run a pipeline for the test sample.
$ source activate encode-chip-seq-pipeline # IMPORTANT! $ INPUT=examples/local/ENCSR936XTK_subsampled.json $ java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm cromwell-34.jar run chip.wdl -i ${INPUT} -o workflow_opts/slurm.json
-
It will take about an hour. You will be able to find all outputs on
cromwell-executions/chip/[RANDOM_HASH_STRING]/
. See output directory structure for details. -
See full specification for input JSON file.
-
CHECK YOUR SINGULARITY VERSION FIRST AND UPGRADE IT TO A VERSION
>=2.5.2
OR PIPELINE WILL NOT WORK CORRECTLY.$ singularity --version
-
Pull a singularity container for the pipeline. This will pull pipeline's docker container first and build a singularity one on
~/.singularity
.$ SINGULARITY_PULLFOLDER=~/.singularity singularity pull docker://quay.io/encode-dcc/chip-seq-pipeline:v1.1
-
Run a pipeline for the test sample.
$ source activate encode-chip-seq-pipeline # IMPORTANT! $ INPUT=examples/local/ENCSR936XTK_subsampled.json $ java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm_singularity cromwell-34.jar run chip.wdl -i ${INPUT} -o workflow_opts/slurm.json
-
It will take about an hour. You will be able to find all outputs on
cromwell-executions/chip/[RANDOM_HASH_STRING]/
. See output directory structure for details. -
See full specification for input JSON file.
-
IF YOU WANT TO RUN PIPELINES WITH YOUR OWN INPUT DATA/GENOME DATABASE, PLEASE ADD THEIR DIRECTORIES TO
workflow_opts/slurm.json
. For example, you have input FASTQs on/your/input/fastqs/
and genome database installed on/your/genome/database/
then add/your/
to--bind
insingularity_command_options
. You can also define multiple directories there. It's comma-separated.{ "default_runtime_attributes" : { "singularity_container" : "~/.singularity/atac-seq-pipeline-v1.1.simg", "singularity_command_options" : "--bind /your/,YOUR_OWN_DATA_DIR1,YOUR_OWN_DATA_DIR2,..." } }
-
If you want to run multiple (>10) pipelines, then run a cromwell server on an interactive node. We recommend to use
screen
ortmux
to keep your session alive and note that all running pipelines will be killed after walltime. Run a Cromwell server with the following commands. You can skip-p [YOUR_SLURM_PARTITION]
or--account [YOUR_SLURM_ACCOUNT]
according to your cluster's SLURM configuration.$ srun -n 2 --mem 5G -t 3-0 --qos normal -p [YOUR_SLURM_PARTITION] --account [YOUR_SLURM_ACCOUNT] --pty /bin/bash -i -l # 2 CPU, 5 GB RAM and 3 day walltime $ hostname -f # to get [CROMWELL_SVR_IP]
For Conda users,
$ source activate encode-chip-seq-pipeline $ _JAVA_OPTIONS="-Xmx5G" java -jar -Dconfig.file=backends/backend/conf -Dbackend.default=slurm cromwell-34.jar server
For singularity users,
$ _JAVA_OPTIONS="-Xmx5G" java -jar -Dconfig.file=backends/backend/conf -Dbackend.default=slurm_singularity cromwell-34.jar server
-
You can modify
backend.providers.slurm.concurrent-job-limit
orbackend.providers.slurm_singularity.concurrent-job-limit
inbackends/backend.conf
to increase maximum concurrent jobs. This limit is not per sample. It's for all sub-tasks of all submitted samples. -
On a login node, submit jobs to the cromwell server. You will get
[WORKFLOW_ID]
as a return value. Keep these workflow IDs for monitoring pipelines and finding outputs for a specific sample later.$ INPUT=YOUR_INPUT.json $ curl -X POST --header "Accept: application/json" -v "[CROMWELL_SVR_IP]:8000/api/workflows/v1" \ -F [email protected] \ -F workflowInputs=@${INPUT} \ -F workflowOptions=@workflow_opts/slurm.json
To monitor pipelines, see cromwell server REST API description for more details. squeue
will not give you enough information for monitoring jobs per sample.
$ curl -X GET --header "Accept: application/json" -v "[CROMWELL_SVR_IP]:8000/api/workflows/v1/[WORKFLOW_ID]/status"