AEV-PLIG

AEV-PLIG is a GNN-based scoring function that predicts the binding affinity of a bound protein-ligand complex given its 3D structure.

Installation guide
Demo

Installation guide

AEV-PLIG has been tested on the following systems:

macOS: Monterey (12.5.1)

Create conda environment

Installation times may vary, but took around 30 seconds on Mac M1. For macOS:

conda env create --file aev-plig-mac.yml

For Linux:

conda env create --file aev-plig-linux.yml

Install packages manually:

conda create --name aev-plig python=3.8
conda activate aev-plig
pip install torch torchvision torchaudio
pip install torch-scatter
pip install torch_geometric
pip install rdkit
pip install torchani
pip install qcelemental
pip install pandas

Demo

This section demonstrates how to train your own AEV-PLIG model, and how to use AEV-PLIG to make predictions.

The computational requirements for each script are included, and unless otherwise specified, the hardware used is a Mac M1 CPU.

Training

Download training data

Download the training datasets PDBbind and BindingNet

wget http://pdbbind.org.cn/download/PDBbind_v2020_other_PL.tar.gz
wget http://pdbbind.org.cn/download/PDBbind_v2020_refined.tar.gz
wget http://bindingnet.huanglab.org.cn/api/api/download/binding_database

Put PDBbind data into data/pdbbind/refined-set and data/pdbbind/general-set

Put BindingNet data into data/bindingnet/from_chembl_client

Generate PDBbind and BindingNet graphs

The following scripts will generate graphs into pdbbind.pickle and bindingnet.pickle. Takes around 30 minute in total to run.

python generate_pdbbind_graphs.py
python generate_bindingnet_graphs.py

Generate data for pytorch

Running this script takes around 2 minutes.

python create_pytorch_data.py

The script outputs the following files in data/processed/:

pdbbind_U_bindingnet_ligsim90_train.pt, pdbbind_U_bindingnet_ligsim90_valid.pt, and pdbbind_U_bindingnet_ligsim90_test.pt

Run training

Running the following script takes 25 hours using a NVIDIA GeForce GTX 1080 Ti GPU. Once a model has been trained, the next section describes how to use it for predictions.

python training.py --activation_function=leaky_relu --batch_size=128 --dataset=pdbbind_U_bindingnet_ligsim90 --epochs=200 --head=3 --hidden_dim=256 --lr=0.00012291937615434127 --model=GATv2Net

The trained models are saved in output/trained_models

Predictions

In order to make predictions, the model requires a .csv file with the following columns:

unique_id, unique identifier for the datapoint
pK, binding affinity label in pk units
sdf_file, relative path to the ligand .sdf file
pdb_file, relative path to the protein .pdb file

An example dataset is included in data/example_dataset.csv for this demo.

python process_and_predict.py --dataset_csv=data/example_dataset.csv --data_name=example --trained_model_name=20231116-181233_model_GATv2Net_pdbbind_core

The script processes data in dataset_csv, and removes datapoints if:

.sdf file cannot be read by RDkit
Molecule contains rare element
Molecule has undefined bond type

The script then creates graphs and pytorch data to run the AEV-PLIG model specified with trained_model_name. The default is AEV-PLIG trained on PDBbind v2020 but we recommend using AEV-PLIG trained with PDBbind v2020 and BindingNet.

The predictions are saved under output/predictions/data_name_predictions.csv

For the example dataset, the script takes around 20 seconds to run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AEV-PLIG

Installation guide

Create conda environment

Demo

Training

Download training data

Generate PDBbind and BindingNet graphs

Generate data for pytorch

Run training

Predictions

Files

README.md

Latest commit

History

README.md

File metadata and controls

AEV-PLIG

Installation guide

Create conda environment

Demo

Training

Download training data

Generate PDBbind and BindingNet graphs

Generate data for pytorch

Run training

Predictions