Skip to content

Run large scale HADDOCK simulations using multiple input molecules in different scenarios

License

Notifications You must be signed in to change notification settings

haddocking/haddock-runner

Repository files navigation

haddock-runner for HADDOCK

DOI fair-software.eu License

linting unittests Codacy Badge Codacy Badge

Table of contents

  1. Introduction
  2. Installation
  3. Usage
  4. Development

What is this repository for?

This repository contains a set of tools to benchmark the performance of HADDOCK2.4/HADDOCK3.0.

These can be used to compare the perfomance of HADDOCK against other software packages, to compare the performance of different versions of HADDOCK, or to compare the performance of HADDOCK on different hardware.

Additionally it can be used to perform large-scale docking experiments in different scenarios (parameters), for example:

  • You have obtained experimental data for a set of proteins and you want to dock them against a set of targets. You want to test different parameters to see which one gives the best results.
    • Scenario 1: Use all information
    • Scenario 2: Use only 50% of the information
    • Scenario 3: ab initio docking (without information)

How do I get set up?

To run the benchmarking tools, you need to have a working (local) installation of HADDOCK2.4. The software is free for academic use and can be obtained via registration. More information can be obtained from the HADDOCK website.

For more information on how to install HADDOCK, please refer to the documentation and also to the HADDOCK.md file in this repository.

Previous version

"Hey, what happened to the previous version of this repository? Where is the python code?!" - you might ask.

The previous vesion of this repository was indeed written in Python but have been migrated to Go. The main reason for this is that the python version was slow, not very efficient (or well designed) also it had no tests...! The Go version is faster, efficient and easier to maintain - see the code coverage.

However you can still find the Python version as the v0.2.1 tag in this repository HERE.

Requirements

Installation

Clone the repository

git clone https://github.com/haddocking/haddock-runner.git
cd haddock-runner
go build -o haddock-runner
./haddock-runner -version

OR

Use the pre-compiled binaries from the latest release

Usage

Check the step-by-step tutorial at bonvinlab.org/education/haddock-runner

Usage: haddockrunner [options] <input file>

Run HADDOCK benchmarking

Options:
  -version: Print version and exit
  • input.yml

The input of haddock-runner is a .yml file; YAML is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. For more information, please refer to the YAML website.

Currently haddock-runner can be executed for both the production-stable version 2.4 and the experimental 3.0.0-beta2. The input file is slightly different for each version. The input file for version 2.4 is described below.

The input file for version 3.0.0-beta2 is described HERE.

An example file is provided in the examples folder (example_input.yml) and also below. Its composed of two main sections, general which defines the general parameters of the benchmarking experiment, and scenarios which defines the different scenarios to be tested.

# General parameters
general:
  # Location of the HADDOCK script (see more below)
  executable: /trinity/login/rodrigo/projects/benchmarking/haddock24.sh
  # How many jobs should be executed at a given time
  max_concurrent: 2
  # Location where HADDOCK is installed
  haddock_dir: /trinity/login/abonvin/haddock_git/haddock2.4
  # Pattern used to identify the receptor files
  receptor_suffix: _r_u
  # Pattern used to identify the ligand files
  ligand_suffix: _l_u
  # Location of the input list
  input_list: /trinity/login/rodrigo/projects/benchmarking/input.txt
  # Location of the benchmark output
  work_dir: /trinity/login/rodrigo/projects/benchmarking

# Scenarios of the benchmarking experiment
scenarios:
  # Name can be anything you want to identify the scenario
  - name: true-interface
    # Parameters to be used in the scenario
    parameters:
      # The parameters below are the same as the
      #  ones used in the HADDOCK input file `run.cns`
      run_cns:
        noecv: false
        structures_0: 2
        structures_1: 2
        waterrefine: 2
      # Patterns used to identify the restraints files
      restraints:
        ambig: ti
        unambig: unambig
        hbonds: hb
      # Patterns used to identify the custom topology files
      custom_toppar:
        topology: _ligand.top
        param: _ligand.param

  - name: center-of-mass
    parameters:
      run_cns:
        cmrest: true
        structures_0: 2
        structures_1: 2
        waterrefine: 2
  • haddock24.sh

The haddock24.sh script is a wrapper around the HADDOCK2.4 executable. It is used to run HADDOCK in a given folder, and it is called by haddock-runner for each scenario. The script is provided in the examples folder (haddock24.sh) and also below.

Important: Keep in mind that HADDOCK2.4 runs on Python2.7, which is likely not present in recent systems. For tips on how to install it, please refer to the PYTHON2.md file in this repository.

#!/bin/bash
#===============================================================
# HADDOCK2.4 wrapper script
#===============================================================

# Export the required environment variables
export HADDOCK="/trinity/login/abonvin/haddock_git/haddock2.4"
export HADDOCKTOOLS="$HADDOCK/tools"
export PYTHONPATH="${PYTHONPATH}:$HADDOCK"

# Command to run HADDOCK
$(which python2.7) $HADDOCK/Haddock/RunHaddock.py
#===============================================================
  • input_list.txt

The input file is a list of the input files to be used in the benchmarking experiment.This is a simple text file with one line per input. Each line contains the path to one of the input files. The input files must be:

  • .pdb: for receptor and ligand files
  • .top: for custom topology files (used for small-molecules)
  • .param : for custom parameter files (used for small-molecules)
  • .tbl: a table file containing the restraints to be used in the docking experiment

This list is parsed by haddock-runner and are identified according to the patterns set in input.yml. Lines begining with # are ignored and can be used to document the input list for future reference - in-line comments are not supported.

An example file is provided in the examples folder (example_input_list.txt) and also below.

#  Lines starting with # are comments
# ------------------------------------------------------------ #
# Input list
# ------------------------------------------------------------ #
# 1A2K
example/1A2K/1A2K_r_u.pdb
example/1A2K/1A2K_l_u.pdb
example/1A2K/1A2K_ligand.top
example/1A2K/1A2K_ligand.param
example/1A2K/1A2K_ti.tbl
example/1A2K/1A2K_unambig.tbl
# 1GGR
example/1GGR/1GGR_r_u.pdb
example/1GGR/1GGR_l_u_1.pdb
example/1GGR/1GGR_l_u_2.pdb
example/1GGR/1GGR_l_u_3.pdb
example/1GGR/1GGR_l_u_4.pdb
example/1GGR/1GGR_l_u_5.pdb
example/1GGR/1GGR_ti.tbl
# 1PPE
example/1PPE/1PPE_l_u.pdb
example/1PPE/1PPE_r_u.pdb
example/1PPE/1PPE_ti.tbl
example/1PPE/1PPE_hb.tbl
example/1PPE/1PPE_unambig.tbl
# 2OOB
example/2OOB/2OOB_l_u.pdb
example/2OOB/2OOB_r_u.pdb
example/2OOB/2OOB_ti.tbl
example/2OOB/2OOB_hb.tbl
# ------------------------------------------------------------ #

Ensembles: multiple conformations of a receptor or ligand are also supported, they need to follow the naming convention: <root>_<ligand|receptor suffix>_N.pdb, where N is the ensemble number. For example, if the ligand suffix is _l_u, the ligand files for the first ensemble would be:

<root>_l_u_1.pdb
<root>_l_u_2.pdb
<root>_l_u_3.pdb

Important: Do not provide a multi-model ensemble file, instead provide the individual models.

pending