Tool to predict water molecules placement and energy in ligand binding sites
You need, at a minimum (requirements):
- Python (>=3.7)
- OpenBabel
- Numpy
- Scipy
- Pandas
- tqdm (progress bar)
- AutoDock Vina and AutoGrid (for generating maps)
- AmberTools (protein preparation and gist calculations)
- OpenMM (minimization)
- ParmED (files conversion)
- gridData (read dx files from GIST)
- Sphinx (documentation)
- Sphinx_rtd_theme (documentation)
I highly recommand you to install the Anaconda distribution (https://www.continuum.io/downloads) if you want a clean python environnment with nearly all the prerequisites already installed. To install everything properly, you just have to do this:
$ conda create -n waterkit -c conda-forge python=3 mkl numpy scipy pandas \
openbabel parmed ambertools openmm netcdf4 griddataformats tqdm \
sphinx sphinx_rtd_theme
$ conda activate waterkit
$ pip install vina
We can now install the WaterKit
package
$ git clone https://github.com/forlilab/waterkit
$ cd waterkit
$ pip install -e .
Finally we will need to compile a special version of autogrid
. This version supports up to 56 different atom types. You will need it to run WaterKit.
$ cd autodocksuite-4.2.6-src/autogrid
$ autoreconf -i
$ mkdir x86_64Linux2 # for x86_64architecture
$ cd x86_64Linux2
$ ../configure
$ make
$ make install # optional
Conversion to PDBQT using AmberTools and wk_prepare_receptor.py
script. If you added hydrogen atoms with REDUCE
, you can keep them by using the --keep_hydrogen
argument. The protonation states for histidine residues (HIE
, HID
, HIP
) will then be automatically assigned based the presence of hydrogen atoms on NE2
or ND1
. Histidine residues already set HIE
, HID
or HIP
will remain unchanged. Per default, the protonation state of histidine residues is set to HIE
.
$ wk_prepare_receptor.py -i protein.pdb -o protein_prepared --pdb --amber_pdbqt
The following protein coordinate files will be generated:protein_prepared.pdb
and protein_prepared_amber.pdbqt
. The PDBQT file will be used by WaterKit and the PDB file will be used to create the trajectory file at the end.
Run WaterKit
$ mkdir traj
# Generate 10.000 frames using 16 cpus (tip3p, 300 K, 3 hydration layers)
# X Y Z define the center and SX SY SZ the size (in Angstrom) of the box
# If it has issue locating autogrid, specify the path with the argument --autogrid_exec_path
$ run_waterkit.py -i protein_prepared_amber.pdbqt -c X Y Z -s SX SY SZ -n 10000 -j 16 -o traj
# Create ensemble trajectory
$ wk_make_trajectory.py -r protein_prepared.pdb -w traj -o protein
# Minimize each conformation (100 steps, 2.5 kcal/mol/A**2 restraints on heavy atoms, CUDA)
$ wk_minimize_trajectory.py -p protein_system.prmtop -t protein_system.nc -o protein_min.nc
- Create input file for cpptraj
# gist.inp
parm protein_system.prmtop
trajin protein.nc
# X Y Z define the center and GX GY GZ the size (in gridpoints) of the box
# Example: if SX = SY = SZ = 24 Angstrom and gridspacn = 0.5, then GX = GY = GZ = 48
gist gridspacn 0.5 gridcntr X Y Z griddim GX GY GZ
go
quit
Usually you would choose the same parameters as the AutoGrid maps (npts
and gridcenter
). Unlike AutoGrid, the default the grid spacing in GIST is 0.5 A, so you will have to choose box dimension accordingly to match the Autogrid maps dimensions. More informations on GIST are available here: https://amber-md.github.io/cpptraj/CPPTRAJ.xhtml#magicparlabel-4672
- Run GIST
$ cpptraj -i gist.inp
Hydration sites are identified based on the oxygen density map (gO) in an iterative way, by selecting the voxel with the highest density, then the second highest and so on, whil keeping a minimum distance of 2.5 A between them. Energies for each of those identified hydration sites are then computed by adding all the surrounding voxels and Gaussian weighted by their distances from the hydration site.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
from gridData import Grid
from waterkit.analysis import HydrationSites
from waterkit.analysis import blur_map
gO = Grid("gist-gO.dx")
esw = Grid('gist-Esw-dens.dx')
eww = Grid('gist-Eww-dens.dx')
tst = Grid('gist-dTStrans-dens.dx')
tso = Grid('gist-dTSorient-dens.dx')
dg = (esw + 2 * eww) - (tst + tso)
# Identification of hydration site positions using gO
hs = HydrationSites(gridsize=0.5, water_radius=1.4, min_water_distance=2.5, min_density=2.0)
hydration_sites = hs.find(gO) # can pass "gist-gO.dx" directly also
# Get Gaussian smoothed energy for hydration sites only
dg_energy = hs.hydration_sites_energy(dg, water_radius=1.4)
hs.export_to_pdb("hydration_sites_dG_smoothed.pdb", hydration_sites, dg_energy)
# ... or get the whole Gaussian smoothed map
map_smooth = blur_map(dg, radius=1.4)
map_smooth.export("gist-dG-dens_smoothed.dx")