-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of https://github.com/A4Bio/ProDesign
- Loading branch information
Showing
10 changed files
with
57 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,24 @@ | ||
# ProDesign: Toward effective and efficient protein design | ||
# PiFold: Toward effective and efficient protein inverse folding | ||
|
||
![GitHub stars](https://img.shields.io/github/stars/A4Bio/PiFold) ![GitHub forks](https://img.shields.io/github/forks/A4Bio/PiFold?color=green) <!-- ![visitors](https://visitor-badge.glitch.me/badge?page_id=A4Bio.PiFold) --> | ||
|
||
**PiFold: Toward effective and efficient protein inverse folding (Spotlight)** | ||
[Zhangyang Gao](https://westlake-drug-discovery.github.io/zhangyang_gao.html), [Cheng Tan](https://chengtan9907.github.io/), [Stan Z. Li](https://scholar.google.com/citations?user=Y-nyLGIAAAAJ&hl). In [ICLR](https://openreview.net/forum?id=oMsN9TYwJ0j), 2023. | ||
|
||
Note that we renamed **ProDesign** to **PiFold** to avoid naming conflicts with previous work. | ||
|
||
|
||
The pre-print paper is available at [this link](https://github.com/A4Bio/ProDesign/blob/main/assets/ProDesign.pdf). | ||
|
||
## 1. Introduction | ||
How to design protein sequences folding into the desired structures effectively and efficiently? Structure-based protein design has attracted increasing attention in recent years; however, few methods can simultaneously improve the accuracy and efficiency due to the lack of expressive features and autoregressive sequence decoder. To address these issues, we propose ProDesign, which contains a novel residue featurizer and ProGNN layers to generate protein sequences in a one-shot way with improved recovery. Experiments show that ProDesign could achieve 51.66\% recovery on CATH 4.2, while the inference speed is 70 times faster than the autoregressive competitors. In addition, ProDesign achieves 58.72\% and 60.42\% recovery scores on TS50 and TS500, respectively. We conduct comprehensive ablation studies to reveal the role of different types of protein features and model designs, inspiring further simplification and improvement. | ||
How can we design protein sequences folding into the desired structures effectively and efficiently? Structure-based protein design has attracted increasing attention in recent years; however, few methods can simultaneously improve the accuracy and efficiency due to the lack of expressive features and autoregressive sequence decoder. To address these issues, we propose PiFold, which contains a novel residue featurizer and PiGNN layers to generate protein sequences in a one-shot way with improved recovery. Experiments show that PiFold could achieve 51.66\% recovery on CATH 4.2, while the inference speed is 70 times faster than the autoregressive competitors. In addition, PiFold achieves 58.72\% and 60.42\% recovery scores on TS50 and TS500, respectively. We conduct comprehensive ablation studies to reveal the role of different types of protein features and model designs, inspiring further simplification and improvement. | ||
|
||
<p align="center"> | ||
<img src='./assets/acc_speed.png' width="600"> | ||
<img src='./assets/acc_speed2.png' width="600"> | ||
</p> | ||
|
||
## 2. Framework | ||
We show the overall ProDesign framework. The inputs are protein structures, and outputs are protein sequences expected to fold into the input structures. We propose a novel residue featurizer and ProGNN layer to learn expressive residue representations. Specifically, the residue featurizer constructs comprehensive residue features and creates learnable virtual atoms to capture information complementary to real atoms. The ProGNN | ||
considers multi-scale residue interactions in node, edge, and global context levels. ProDesign could generate protein sequences in a one-shot manner with a higher recovery than previous autoregressive or iterative models. | ||
We show the overall PiFold framework. The inputs are protein structures, and outputs are protein sequences expected to fold into the input structures. We propose a novel residue featurizer and PiGNN layer to learn expressive residue representations. Specifically, the residue featurizer constructs comprehensive residue features and creates learnable virtual atoms to capture information complementary to real atoms. The PiGNN | ||
considers multi-scale residue interactions in node, edge, and global context levels. PiFold could generate protein sequences in a one-shot manner with a higher recovery than previous autoregressive or iterative models. | ||
|
||
<p align="center"> | ||
<img src='./assets/framework.png' width="600"> | ||
|
@@ -21,14 +28,14 @@ considers multi-scale residue interactions in node, edge, and global context lev | |
We comprehensively evaluate different results on CATH, TS50 and TS500. | ||
|
||
<p align="center"> | ||
<img src='./assets/results_CATH.png' width="600"> | ||
<img src='./assets/results_CATH2.png' width="600"> | ||
</p> | ||
|
||
<p align="center"> | ||
<img src='./assets/results_TS.png' width="600"> | ||
</p> | ||
|
||
You can reproduce results of ProDesign on colab: | ||
You can reproduce results of PiFold on colab: | ||
|
||
<a href="https://colab.research.google.com/drive/1HgXQCbsoK09mcVZmPgIWlCczY64l0iIX?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> | ||
|
||
|
@@ -43,6 +50,27 @@ For a given protein backbone design a new sequence that folds into that conforma | |
<a href="https://colab.research.google.com/drive/1z6vpKA5L1iAmBLfREbmy8VNOtDYlkY4Q?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> | ||
<!-- [[Colab]](https://colab.research.google.com/drive/1z6vpKA5L1iAmBLfREbmy8VNOtDYlkY4Q?usp=sharing) --> | ||
|
||
## Citation | ||
|
||
If you are interested in our repository and our paper, please cite the following paper: | ||
|
||
``` | ||
@inproceedings{ | ||
gao2023pifold, | ||
title={PiFold: Toward effective and efficient protein inverse folding}, | ||
author={Zhangyang Gao and Cheng Tan and Stan Z. Li}, | ||
booktitle={International Conference on Learning Representations}, | ||
year={2023}, | ||
url={https://openreview.net/forum?id=oMsN9TYwJ0j} | ||
} | ||
@article{gao2022pifold, | ||
title={PiFold: Toward effective and efficient protein inverse folding}, | ||
author={Gao, Zhangyang and Tan, Cheng and Li, Stan Z}, | ||
journal={arXiv preprint arXiv:2209.12643}, | ||
year={2022} | ||
} | ||
``` | ||
|
||
## Feedback | ||
If you have any issue about this work, please feel free to contact me by email: | ||
* Zhangyang Gao: [email protected] | ||
|
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2022 PiFold | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |