Tutorial for SLURM clusters

Git clone this pipeline and move into it.

  $ git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2
  $ cd chip-seq-pipeline2

Download cromwell.

  $ wget https://github.com/broadinstitute/cromwell/releases/download/34/cromwell-34.jar
  $ chmod +rx cromwell-34.jar

Download a SUBSAMPLED (1/400) paired-end sample of ENCSR936XTK.

  $ wget https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR936XTK/ENCSR936XTK_fastq_subsampled.tar
  $ tar xvf ENCSR936XTK_fastq_subsampled.tar

Download pre-built genome database for hg38.

  $ wget https://storage.googleapis.com/encode-pipeline-genome-data/test_genome_database_hg38_chip.tar
  $ tar xvf test_genome_database_hg38_chip.tar

Set your partition/account in workflow_opts/slurm.json. If your SLURM cluster does not require either user's partition or account information, then remove them from this file. Otherwise, YOUR_SLURM_PARTITON or YOUR_SLURM_ACCOUNT will be used internally for srun ... --partition YOUR_SLURM_PARTITON or srun ... --account YOUR_SLURM_PARTITON, respectively.
```
  {
    "default_runtime_attributes" : {
      "slurm_partition": "YOUR_SLURM_PARTITON",
      "slurm_account": "YOUR_SLURM_ACCOUNT"
    }
  }
```

Our pipeline supports both Conda and Singularity.

For Conda users,

Install Conda

Install Conda dependencies.

  $ bash conda/uninstall_dependencies.sh  # to remove any existing pipeline env
  $ bash conda/install_dependencies.sh

Run a pipeline for the test sample.

  $ source activate encode-chip-seq-pipeline # IMPORTANT!
  $ INPUT=examples/local/ENCSR936XTK_subsampled.json
  $ java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm cromwell-34.jar run chip.wdl -i ${INPUT} -o workflow_opts/slurm.json

It will take about an hour. You will be able to find all outputs on cromwell-executions/chip/[RANDOM_HASH_STRING]/. See output directory structure for details.
See full specification for input JSON file.

For singularity users

CHECK YOUR SINGULARITY VERSION FIRST AND UPGRADE IT TO A VERSION >=2.5.2 OR PIPELINE WILL NOT WORK CORRECTLY.
```
  $ singularity --version
```
Pull a singularity container for the pipeline. This will pull pipeline's docker container first and build a singularity one on ~/.singularity.
```
  $ SINGULARITY_PULLFOLDER=~/.singularity singularity pull docker://quay.io/encode-dcc/chip-seq-pipeline:v1.1
```

Run a pipeline for the test sample.

  $ source activate encode-chip-seq-pipeline # IMPORTANT!
  $ INPUT=examples/local/ENCSR936XTK_subsampled.json
  $ java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm_singularity cromwell-34.jar run chip.wdl -i ${INPUT} -o workflow_opts/slurm.json

It will take about an hour. You will be able to find all outputs on cromwell-executions/chip/[RANDOM_HASH_STRING]/. See output directory structure for details.
See full specification for input JSON file.
IF YOU WANT TO RUN PIPELINES WITH YOUR OWN INPUT DATA/GENOME DATABASE, PLEASE ADD THEIR DIRECTORIES TO workflow_opts/slurm.json. For example, you have input FASTQs on /your/input/fastqs/ and genome database installed on /your/genome/database/ then add /your/ to --bind in singularity_command_options. You can also define multiple directories there. It's comma-separated.
```
  {
      "default_runtime_attributes" : {
          "singularity_container" : "~/.singularity/atac-seq-pipeline-v1.1.simg",
          "singularity_command_options" : "--bind /your/,YOUR_OWN_DATA_DIR1,YOUR_OWN_DATA_DIR2,..."
      }
  }
```

Running multiple pipelines with cromwell server mode

If you want to run multiple (>10) pipelines, then run a cromwell server on an interactive node. We recommend to use screen or tmux to keep your session alive and note that all running pipelines will be killed after walltime. Run a Cromwell server with the following commands. You can skip -p [YOUR_SLURM_PARTITION] or --account [YOUR_SLURM_ACCOUNT] according to your cluster's SLURM configuration.

  $ srun -n 2 --mem 5G -t 3-0 --qos normal -p [YOUR_SLURM_PARTITION] --account [YOUR_SLURM_ACCOUNT] --pty /bin/bash -i -l    # 2 CPU, 5 GB RAM and 3 day walltime
  $ hostname -f    # to get [CROMWELL_SVR_IP]

For Conda users,

  $ source activate encode-chip-seq-pipeline
  $ _JAVA_OPTIONS="-Xmx5G" java -jar -Dconfig.file=backends/backend/conf -Dbackend.default=slurm cromwell-34.jar server

For singularity users,

  $ _JAVA_OPTIONS="-Xmx5G" java -jar -Dconfig.file=backends/backend/conf -Dbackend.default=slurm_singularity cromwell-34.jar server

You can modify backend.providers.slurm.concurrent-job-limit or backend.providers.slurm_singularity.concurrent-job-limit in backends/backend.conf to increase maximum concurrent jobs. This limit is not per sample. It's for all sub-tasks of all submitted samples.

On a login node, submit jobs to the cromwell server. You will get [WORKFLOW_ID] as a return value. Keep these workflow IDs for monitoring pipelines and finding outputs for a specific sample later.

  $ INPUT=YOUR_INPUT.json
  $ curl -X POST --header "Accept: application/json" -v "[CROMWELL_SVR_IP]:8000/api/workflows/v1" \
    -F workflowSource=@chip.wdl \
    -F workflowInputs=@${INPUT} \
    -F workflowOptions=@workflow_opts/slurm.json

To monitor pipelines, see cromwell server REST API description for more details. squeue will not give you enough information for monitoring jobs per sample. $ curl -X GET --header "Accept: application/json" -v "[CROMWELL_SVR_IP]:8000/api/workflows/v1/[WORKFLOW_ID]/status"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutorial_slurm.md

tutorial_slurm.md

Tutorial for SLURM clusters

For Conda users,

For singularity users

Running multiple pipelines with cromwell server mode

Files

tutorial_slurm.md

Latest commit

History

tutorial_slurm.md

File metadata and controls

Tutorial for SLURM clusters

For Conda users,

For singularity users

Running multiple pipelines with cromwell server mode