Skip to content

Latest commit

 

History

History
29 lines (18 loc) · 1.49 KB

README.md

File metadata and controls

29 lines (18 loc) · 1.49 KB

Evolutionary Information Language Model

This is a bi-directional language model (LM) embedding the protein sequence in a numerical representation encoding biophysical, biochemical, and the evolutionary information of the protein. The pretrained weights are availble in a sub-branch of the repo, 'Evol_Info_Embedder', trained on a set of 2 Million proteins in a cluster of GPUs granted by Google.

Publication

If you use PEvoLM in your work, please cite the following publication:

  • I. Arab, PEvoLM: Protein Sequence Evolutionary Information Language Model, 2023 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Eindhoven, Netherlands, 2023, pp. 1-8, doi:10.1109/CIBCB56990.2023.10264890

Protein Sequence Evolutionary Information Language Model (PEvoLM)

Requirements

  • Python>=3.6.5
  • torch>=1.2.0
  • allennlp v0.9

seq_vec_model

The ELMo model trained on UniRef50 (=SeqVec) is available at: SeqVec-model

Please download the weights, copy the file, and paste it in the folder "seq_vec_model"