This repository contains code for predicting physical interaction of proteins using direct coupling analysis and graph neural networks.
This code was designed to run on the s3it cluster to avoid out of memory (OOM) issues.
Because of the large datasets involved data-loading and training can take a large amount of time. To run on s3it GPU cluster, make sure to have a GPU compatible version of tensorflow installed before running the code. Please visit s3it-GPU instructions.
This requires a valid installation of Anaconda or miniconda. Create the Python environment as described below:
cd configs
conda env create -f env.yml
conda activate gcn_env
You can find the data paths within the gcn_generator.py file (see scripts directory).
To generate the inter protein graphs, run:
cd src/scripts
bash run_graph_generator.sh
To run the model simply run:
cd src/scripts
bash run_graph_prediction.sh
Running the GCN on the E.coli dataset shows a learnable signal and good predictive performance
Models | Resources |
---|---|
Neural Net (Spektral GCN) | Spektral model |
There may be issues running this code from a local machine. It was designed to run on the s3it cluster.
The conda environemnt provided should contain all of these requirements. If not, you can find them at the following sources.
Dependency | Installation |
---|---|
Spektral (cpu, linux) | Pypi |