SARS-CoV-2 sequence predictor

COVID-19 is a serious global health problem and producing vaccines against evolved SARS-CoV-2 virus is an important issue. We thus aim to predict the sequence evolution of SARS-CoV-2 spike protein by using deep learning. The pipeline, methods and codes here were adapted from a flu-forecaster developed by Eric Ma. Some codes were partially rewritten so that they can be run on Google Colab.

Data

The SARS-CoV-2 sequence data came from the NCBI virus database(IRD). Search parameters were as follows:

Species: Severe acute respiratory syndrome coronavirus 2
Sequence Length: 1273
Nucleotide completeness: complete
Protein: Surface glycoprotein
Colletion date: 2020/5/1~2021/12/31
Graphic regions: North America
isolation source: oronasopharynx
Host: Homo (humans)

Or you can download the sequence file "sequences_2020May_to_2021Dec.fasta" and put it in folder "data_covid19" before executing the code.

Structure

Use variational autoencoders, a deep learning method, to learn a latent manifold on which sequence evolution is taking place.
Simultaneously construct a genotype network of SARS-CoV-2 evolution.
1. Nodes: SARS-CoV-2 protein sequences.
2. Edges: Sequences differ by one amino acid.
Sanity checks:
1. Plot edit distance between any two random pairs of protein sequences against their manifold distance. There should be a linear relationship between the two.
Validation:
1. MVP validation will be done by doing one round of "back testing" - we hold out data from 2021/8/1 to 2021/12/31, and predict whether data shows up or not.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
SARS-CoV-2-sequence-predictor		SARS-CoV-2-sequence-predictor
README.md		README.md
SARS_CoV_2_predictor.ipynb		SARS_CoV_2_predictor.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SARS-CoV-2 sequence predictor

Data

Structure

About

Releases

Packages

Languages

Spheluo/SARS-CoV2-sequence-predictor

Folders and files

Latest commit

History

Repository files navigation

SARS-CoV-2 sequence predictor

Data

Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages