Skip to content

Latest commit

 

History

History
109 lines (69 loc) · 3.39 KB

README.md

File metadata and controls

109 lines (69 loc) · 3.39 KB

GTE: A Graph Learning Framework for Prediction of T-cell Receptors and Epitopes Binding Specificity

Welcome to GTE, a powerful Graph Learning Framework designed for the prediction of T-cell Receptors and Epitopes binding specificity.

Folder Structure

The project's folder structure is as follows:

  • models folder:

    The 'models' folder contains saved models generated by GTE. It includes models for four different datasets, divided into RandomTCR and StrictTCR partitions, each with results for individual folds. In total, you will find 40 models.

    The naming convention for model files is as follows: XXXXX_0123_4, where XXXXX represents the dataset name, 0123 represents the fold used for training, and 4 indicates that the model is used for testing.

    You can download our 40 models for inference here.

  • processed_data folder:

    This folder contains the raw data for each dataset and the pre-processed 5-fold data. These data are used for training and testing the models.

  • results folder:

    In this folder, we store the model's predictions on the datasets. These results can help us analyze model performance and generate further visualizations and reports.

Quick Start

  1. Create a Conda Environment:

    Start by creating a Conda environment with Python 3.11. If you haven't already installed Conda, you can get it from Anaconda.

    conda create -n GTE python=3.11

    Activate the environment:

    conda activate GTE
  2. Install Dependencies:

    Use pip to install the required packages listed in the requirements.txt file.

    pip install -r requirements.txt
  3. How to Run:

    To quickly run the program, use the following command:

    python inference.py --split RandomTCR --dataset pMTnet 

    Available options:

    • --split:

      • Default: "RandomTCR"
      • Choices: ["RandomTCR", "StrictTCR"]
    • --dataset:

      • Default: "pMTnet"
      • Choices: ["McPAS", "pMTnet", "VDJdb", "TEINet"]
    • --device:

      • Default: "cpu"
      • Choices: ["cpu", "gpu"]
    • --gpu_id:

      • Default: 0
      • Description: When using a GPU, this specifies which GPU to use by its ID. The default is the first GPU (ID 0).
      • Example:
        python inference.py --split RandomTCR --dataset pMTnet --device gpu --gpu_id 0
  4. Example Output:

    You chose the dataset: pMTnet
    The split method is: RandomTCR
    Fold: 0, AUC: 0.9113, AUPR: 0.6501
    Fold: 1, AUC: 0.9098, AUPR: 0.6438
    Fold: 2, AUC: 0.9079, AUPR: 0.6438
    Fold: 3, AUC: 0.9077, AUPR: 0.6404
    Fold: 4, AUC: 0.9111, AUPR: 0.6512
  5. Additional Information:

    For more details and customization options, please refer to ours paper. Have fun exploring the GTE framework!

How to Train

The downloaded test model contains embeddings generated by TCRpeg. If you need embeddings from ESM-2, please refer to ESM-2's GitHub.

Next, simply run the following command:

python train.py --gpu 0 --configs_path configs/pMTnet.yml --droup_out 0.1 --split StrictTCR

Please ensure that the paths in configs/XXXXX.yml are correct, including the paths for training and testing files, and the embeddings.