Skip to content

Commit

Permalink
remove mhcflurry
Browse files Browse the repository at this point in the history
  • Loading branch information
gcroci2 committed Sep 18, 2023
1 parent 8e9c45e commit c19d97a
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 8 deletions.
7 changes: 0 additions & 7 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -205,13 +205,6 @@ @INPROCEEDINGS {deepatom
month = {nov}
}

@article{2020mhcflurry,
title={MHCflurry 2.0: Improved Pan-Allele Prediction of MHC Class I-Presented Peptides by Incorporating Antigen Processing, Cell Syst. 11 (2020) 42-48. e7},
author={O'Donnell, TJ and Rubinsteyn, A and Laserson, U},
journal={P42-P48. e7},
year={2020}
}

@article{torchdrug,
title={TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery},
author={Zhu, Zhaocheng and Shi, Chence and Zhang, Zuobai and Liu, Shengchao and Xu, Minghao and Yuan, Xinyu and Zhang, Yangtian and Chen, Junkun and Cai, Huiyu and Lu, Jiarui and Ma, Chang and Liu, Runcheng and Xhonneux, Louis-Pascal and Qu, Meng and Tang, Jian},
Expand Down
2 changes: 1 addition & 1 deletion paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ In the past decades, a variety of experimental methods (e.g., X-ray crystallogra
# Statement of need

[comment]: <> (Motivation for a flexible framework)
Data mining 3D structures of proteins presents several challenges. These include complex physico-chemical rules governing structural features, the possibility of characterizartion at different scales (e.g., atom-level, residue level, and secondary structure level), and the large diversity in shape and size. Furthermore, because a structures can easily comprise of hundreds to thousands of residues (and ~15 times as many atoms), efficient processing and featurization of many structures is critical to handle the computational cost and file storage requirements. Existing software solutions are often highly specialized and not developed as reusable and flexible frameworks, and cannot be easily adapted to diverse applications and predictive tasks. Examples include DeepAtom [@deepatom] for protein-ligand binding affinity prediction only, MaSIF [@masif] for deciphering patterns in protein surfaces, and MHCFlurry 2.0 [@2020mhcflurry] for predicting binding affinity for a specific type of protein-protein complex (the peptide-major histocompatibility complex (MHC)). While some frameworks, such as TorchProtein and TorchDrug [@torchdrug], configure themselves as general-purpose ML libraries for both molecular sequences and 3D structures, they only implement geometric-related features and do not incorporate fundamental physico-chemical information in the 3D representation of molecules.
Data mining 3D structures of proteins presents several challenges. These include complex physico-chemical rules governing structural features, the possibility of characterizartion at different scales (e.g., atom-level, residue level, and secondary structure level), and the large diversity in shape and size. Furthermore, because a structures can easily comprise of hundreds to thousands of residues (and ~15 times as many atoms), efficient processing and featurization of many structures is critical to handle the computational cost and file storage requirements. Existing software solutions are often highly specialized and not developed as reusable and flexible frameworks, and cannot be easily adapted to diverse applications and predictive tasks. Examples include DeepAtom [@deepatom] for protein-ligand binding affinity prediction only, and MaSIF [@masif] for deciphering patterns in protein surfaces. While some frameworks, such as TorchProtein and TorchDrug [@torchdrug], configure themselves as general-purpose ML libraries for both molecular sequences and 3D structures, they only implement geometric-related features and do not incorporate fundamental physico-chemical information in the 3D representation of molecules.

These limitations create a growing demand for a generic and flexible DL framework that researchers can readily utilize for their specific research questions while cutting down the tedious data preprocessing stages. Generic DL frameworks have already emerged in diverse scientific fields, such as computational chemistry (e.g., DeepChem [@deepchem]) and condensed matter physics (e.g., NetKet [@netket]), which have promoted collaborative efforts, facilitated novel insights, and benefited from continuous improvements and maintenance by engaged user communities.

Expand Down

0 comments on commit c19d97a

Please sign in to comment.