A Pytorch Implementation of paper:
PerceiverCPI: A nested cross-attention network for compound-protein interaction prediction
Ngoc-Quang Nguyen , Gwanghoon Jang , Hajung Kim and Jaewoo Kang
Our reposistory uses https://github.com/chemprop/chemprop as a backbone for compound information extraction. We highly recommend researchers read the paper D-MPNN to better understand how it was used.
Motivation: Compound-protein interaction (CPI) plays an essential role in drug discovery and is performed via expensive molecular docking simulations. Many artificial intelligence-based approaches have been proposed in this regard. Recently, two types of models have accomplished promising results in exploiting molecular information: graph convolutional neural networks that construct a learned molecular representation from a graph structure (atoms and bonds), and neural networks that can be applied to compute on descriptors or fingerprints of molecules. However, the superiority of one method over the other is yet to be determined. Modern studies have endeavored to aggregate information that is extracted from compounds and proteins to form the CPI task. Nonetheless, these approaches have used a simple concatenation to combine them, which cannot fully capture the interaction between such information.
Results: We propose the Perceiver CPI network, which adopts a cross-attention mechanism to improve the learning ability of the representation of drug and target interactions and exploits the rich information obtained from extended-connectivity fingerprints to improve the performance. We evaluated Perceiver CPI on three main datasets, Davis, KIBA, and Metz, to compare the performance of our proposed model with that of state-of-the-art methods. The proposed method achieved satisfactory performance and exhibited significant improvements over previous approaches in all experiments
Set up the environment:
In our experiment we use, Python 3.9 with PyTorch 1.7.1 + CUDA 10.1.
git clone https://github.com/dmis-lab/PerceiverCPI.git
conda env create -f environment.yml
The data should be in the format csv: 'smiles','sequences','label'!
The supplementary can be found: HERE
python train.py --data_path "datasetpath" --separate_val_path "validationpath" --separate_test_path "testpath" --metric mse --dataset_type regression --save_dir "checkpointpath" --target_columns label
Usage Example:
python train.py --data_path ./toy_dataset/novel_pair_0_train.csv --separate_val_path ./toy_dataset/novel_pair_0_val.csv --separate_test_path ./toy_dataset/novel_pair_0_test.csv --metric mse --dataset_type regression --save_dir regression_150_newprot_pre --target_columns label --epochs 150 --ensemble_size 3 --num_folds 1 --batch_size 50 --aggregation mean --dropout 0.1 --save_preds
python predict.py --test_path "testdatapath" --checkpoint_dir "checkpointpath" --preds_path "predictionpath.csv"
Usage Example:
python predict.py --test_path ./toy_dataset/novel_pair_0_test.csv --checkpoint_dir regression_150_newprot_pre --preds_path newnew_fold0.csv
Your data should be in the format csv, and the column names are: 'smiles','sequences','label'.
You can freely tune the hyperparameter for your best performance (but highly recommend using the Bayesian optimization package).
If you find the models useful in your research, please consider citing the relevant paper:
@article{10.1093/bioinformatics/btac731,
author = {Nguyen, Ngoc-Quang and Jang, Gwanghoon and Kim, Hajung and Kang, Jaewoo},
title = "{Perceiver CPI: A nested cross-attention network for compound-protein interaction prediction}",
journal = {Bioinformatics},
year = {2022},
month = {11},
abstract = "{Compound-protein interaction (CPI) plays an essential role in drug discovery and is performed via expensive molecular docking simulations. Many artificial intelligence-based approaches have been proposed in this regard. Recently, two types of models have accomplished promising results in exploiting molecular information: graph convolutional neural networks that construct a learned molecular representation from a graph structure (atoms and bonds), and neural networks that can be applied to compute on descriptors or fingerprints of molecules. However, the superiority of one method over the other is yet to be determined. Modern studies have endeavored to aggregate information that is extracted from compounds and proteins to form the CPI task. Nonetheless, these approaches have used a simple concatenation to combine them, which cannot fully capture the interaction between such information.We propose the Perceiver CPI network, which adopts a cross-attention mechanism to improve the learning ability of the representation of drug and target interactions and exploits the rich information obtained from extended-connectivity fingerprints to improve the performance. We evaluated Perceiver CPI on three main datasets, Davis, KIBA, and Metz, to compare the performance of our proposed model with that of state-of-the-art methods. The proposed method achieved satisfactory performance and exhibited significant improvements over previous approaches in all experiments.Perceiver CPI is available at https://github.com/dmis-lab/PerceiverCPISupplementary data are available at Bioinformatics online.}",
issn = {1367-4803},
doi = {10.1093/bioinformatics/btac731},
url = {https://doi.org/10.1093/bioinformatics/btac731},
note = {btac731},
eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btac731/47214739/btac731.pdf},
}