SARS-CoV-2_Bioinformatics

1. genome/genomics (Guangyuan)

Data resources

GISAID (Global Initiative on Sharing All Influenza Data) International database of hCoV-19 genome sequences and related clinical and epidemiological data

Y. Shu and J. McCauley, GISAID: global initiative on sharing all influenza data – from vision to reality, Eurosurveillance, vol. 22, iss. 13, 2017.
EMBL-EBI: Covid-19 Data Portal The Covid-19 Data Portal developed and hosted by EMBL-EBI brings together relevant datasets for sharing and analysis in an effort to accelerate coronavirus research. It enables researchers to upload, access and analyse COVID-19 related reference data and specialist datasets as part of the wider European COVID-19 Data Platform. It includes some tools as well.
ENA (European Nucleotide Archive) ENA lists data held at EMBL-EBI relating to the COVID-19 outbreak, including sequences of outbreak isolates and records relating to coronavirus biology. In the coming weeks, these data will be included in EMBL-EBI’s new dedicated resource for COVID-19 data, the COVID-19 Portal.
China National Center for Bioinformation (2019nCoVR) 2019nCoVR features comprehensive integration of genomic and proteomic sequences as well as their metadata information from the GISAID, NCBI, NMDC and CNCB/NGDC. It also incorporates a wide range of relevant information including scientific literatures, news, and popular articles for science dissemination, and provides visualization functionalities for genome variation analysis results based on all collected 2019-nCoV strains.

Zhao WM, Song SH, Chen ML, et al. The 2019 novel coronavirus resource. Yi Chuan. 2020;42(2):212–221. doi:10.16288/j.yczz.20-030 [PMID: 32102777]
GenBank Nucleotide Sequences Provides rapid, open, and unrestricted access to virus nucleotide sequences and is the repository being recommended by NIAID and CDC for investigator and public health submissions. Due to the scale of data indexing, there may be a delay before new submissions are indexed and retrievable with a term-based query.
GenBank Protein Sequences Provides rapid, open, and unrestricted access to virus conceptually translated protein sequences and is the repository being recommended by NIAID and CDC for investigator and public health submissions. Due to the scale of data indexing, there may be a delay before new submissions are indexed and retrievable with a term-based query.
NCBI Virus: SARS-CoV-2 data hub SARS-CoV-2 focused content from NCBI Virus, including links to related resources. Search, filter, and download the most up-to-date nucleotide and protein sequences from GenBank and RefSeq (taxid 2697049). Generate multiple sequence alignments and phylogenetic trees for sequences of interest. Provides one-click access to the Betacoronavirus BLAST database and relevant literature in PubMed.
ViPR SARS-CoV-2 data portal | Virus Pathogen Resource The ViPR database integrates various types of data for multiple virus families. You can search the comprehensive database for sequences & strains, immune epitopes, 3D protein structures, host factor data, antiviral drugs, plasmid data. Further you can analyze the data online using sequence alignment, phylogenetic tree reconstruction, sequence variation (SNP), metadata-driven comparative analysis and BLAST. Visit the SARS-CoV-2 data portal in ViPR.

B. E. Pickett, E. L. Sadat, Y. Zhang, J. M. Noronha, B. R. Squires, V. Hunt, M. Liu, S. Kumar, S. Zaremba, Z. Gu, L. Zhou, C. N. Larson, J. Dietrich, E. B. Klem, and R. H. Scheuermann, ViPR: an open bioinformatics database and analysis resource for virology research, Nucleic Acids Res, vol. 40, iss. D1, p. D593–D598, 2011.
INB/ELIXIR-ES and TransBioNet: COVID-19 research
Nextstrain COVID-19 genetic epidemiology Open-source SARS-CoV-2 genome data and analytic and visualization tools
Sequence Read Archive (SRA) Provides rapid, open, and unrestricted access to virus nucleotide or metagenomic sequence data and is the repository being recommended by NIAID and CDC for investigator and public health submissions. Due to the scale of data indexing, there may be a delay before new submissions are indexed and retrievable with a term-based query.
COVID-19 Genome Tracker
ViralZone SARS-CoV-2 protein seqs available at ViralZone
CoV-GLUE an online resource for comparative genomic analysis
Twist Bioscience Twist is offering two fully-synthetic SARS-CoV-2 RNA controls, available for distinct reference sequences: Twist Synthetic SARS-CoV-2 RNA Control 1 (MT007544.1), Twist Synthetic SARS-CoV-2 RNA Control 2 (MN908947.3). The Twist synthetic controls are designed based on two specific SARS-CoV-2 variants, cover the full viral genome and are sequence-verified. In addition, Twist is able to create synthetic RNA controls from other strains or sequences of the virus, and can provide these custom controls within two weeks.
Materials and Methods from Labome
PubMed trending research papers (SARS-CoV-19)

Tools - Detection, Reconstruction, Identification

PriSeT | Efficient De Novo Primer Discovery Appropriate PCR primer pairs for DNA metabarcoding would match to a broad evolutionary range of taxa, such that we only need a few to achieve high taxonomic coverage. At the same time, the DNA barcodes between primer pairs should be different to allow us to distinguish between species to improve resolution. PriSeT finds a primer set P balancing both: high taxonomic coverage and high resolution. It is capable of processing large libraries and is robust against mislabeled or low quality references. It tackles the computationally expensive steps with linear runtime filters and efficient encodings. PriSeT has been applied to 19 SARS-CoV-2 genomes and computed 114 new primer pairs with the additional constraint that the sequences have no co-occurrences in other taxa. These primer sets would be suitable for empirical testing.

M. Hoffmann, M. T. Monaghan, and K. Reinert, PriSeT: efficient de novo primer discovery, bioRxiv, 2020.
CoVPipe | reference-based reconstruction of SARS-CoV-2 genomes CoVPipe is a highly optimized and fully automated workflow for the reference-based reconstruction of SARS-CoV-2 genomes based on next generation amplicon sequencing data using CleanPlex SARS-CoV-2 Panel (Paragon Genomics, Hayward, CA, USA) from swab samples. The pipeline is designed for reproducibility and scalability in order to ensure reliable and fast data analysis of SARS-CoV2 data.
poreCov The nanopore workflow poreCov carries out all necessary steps from basecalling to assembly depending on the user input, followed by lineage prediction of each genome using Pangolin. Furthermore, read coverage plots are provided for each genome to assess the amplification quality of the multiplex PCR. In addition, poreCov includes a quick time tree-based analysis of the inputs against reference sequences. poreCov is implemented in nextflow for full parallelization of the workload and stable sample processing.
V-Pipe | Mining viral genomes and improve clinical diagnostics V-Pipe has released a new version specifically adapted to analyze high-throughput sequencing data of SARS-CoV-2. It allows for the detection of within-host genetic variation of SARS-CoV-2 from viral NGS data.

L. A. Carlisle, T. Turk, K. Kusejko, K. J. Metzner, C. Leemann, C. Schenkel, N. Bachmann, S. Posada, N. Beerenwinkel, J. Böni, S. Yerly, T. Klimkait, M. Perreau, D. L. Braun, A. Rauch, A. Calmy, M. Cavassini, M. Battegay, P. Vernazza, E. Bernasconi, H. F. Günthard, R. D. Kouyos, and Swiss HIV Cohort Study, Viral diversity from next-generation sequencing of HIV-1 samples provides precise estimates of infection recency and time since infection., J Infect Dis, 2019.
VIRify VIRify can be used for the identification of coronaviruses in clinical and environmental samples. VIRify is a recently developed, generic pipeline for the detection, annotation, and taxonomic classification of viral and phage contigs in metagenomic and metatranscriptomic assemblies. VIRify’s taxonomic classification relies on the detection of taxon-specific profile hidden Markov models (HMMs), built upon a set of 22,014 orthologous protein domains and referred to as ViPhOGs. Included in this profile HMM database are 139 models that serve as specific markers for taxa within the Coronaviridae family.
VBRC Tools for Coronaviruses The VBRC was developed for dsDNA viruses but has been adapted for coronaviruses. Only SARS-CoV-2 and closely related viruses will be added to this database. The VBRC provides unique tools that may be useful for the analysis of SARS-CoV-2.
VIRULIGN | Fast codon-correct alignment and annotation of viral genomes VIRULIGN is built for fast codon-correct alignments of large datasets, with standardized and formalized genome annotation and various alignment export formats. VIRULIGN has been adapted to SARS-CoV-2.

P. J. K. Libin, K. Deforche, A. B. Abecasis, and K. Theys, VIRULIGN: fast codon-correct alignment and annotation of viral genomes, Bioinformatics, 2018.
VIGOR4 | Viral Genome ORF Reader VIGOR4 (Viral Genome ORF Reader) is a Java application to predict protein sequences encoded in viral genomes. VIGOR4 determines the protein coding sequences by sequence similarity searching against curated viral protein databases. Vigor4 uses the VIGOR_DB project which currently has databases for the following viruses: Influenza (A & B for human, avian, and swine, and C for human), West Nile Virus, Zika Virus, Chikungunya Virus, Eastern Equine Encephalitis Virus, Respiratory Syncytial Virus, Rotavirus, Enterovirus, Lassa Mammarenavirus. SARS-CoV-2 release is coming (May, 1st).

S. Wang, J. P. Sundaram, and D. Spiro, VIGOR, an annotation program for small viral genomes, BMC Bioinf, vol. 11, iss. 1, 2010.
Rfam COVID-19 Resources In response to the SARS-CoV-2 outbreak, Rfam produced a special release 14.2 that includes 10 new and 4 revised families that can be used to annotate the SARS-CoV-2 and other Coronavirus genomes with RNA families.
Covidex | Alignment-free machine learning subtyping for viral species Covidex is an alignment-free machine learning subtyping tool for viral species, based on a random forest model trained over a kmer database. Currently, it supports FMDV and SARS-Cov-2 viral sequences. The tool allows a fast classification in pre-defined clusters (from the Nextstrain database).

Phylogenetic Analysis

Nextstrain | Genomic analysis of COVID-19 spread Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. They provide a continually-updated view of publicly available data with powerful analytics and visualizations showing pathogen evolution and epidemic spread.
Phylogenetic Network Analysis Used to Trace COVID-19 Infection Routes Early "evolutionary paths" of COVID-19 in humans was reconstructed using phylogenetic network analysis. By analyzing the first 160 complete virus genomes to be sequenced from human patients, some of the original spread of the new coronavirus have been mapped through its mutations, which creates different viral lineages. Mathematical network algorithm was used to visualise all the plausible trees simultaneously

Forster P, Forster L, Renfrew C, Forster M. Phylogenetic network analysis of SARS-CoV-2 genomes. Proc Natl Acad Sci U S A. 2020;117(17):9241‐9243. doi:10.1073/pnas.2004999117
pangolin | Phylogenetic Assignment of Named Global Outbreak Lineages Pangolin assigns a global lineage to query SARS-CoV-2 genomes by estimating the most likely placement within a phylogenetic tree of representative sequences from all currently defined global SARS-CoV-2 lineages based on the lineage nomenclature.

A. Rambaut, E. C. Holmes, V. Hill, Á. O’Toole, J. McCrone, C. Ruis, L. du Plessis, and O. G. Pybus, A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology bioRxiv, 2020.
BEAST 2 | Bayesian evolutionary analysis by sampling trees BEAST 2 is a cross-platform program for Bayesian phylogenetic analysis of molecular sequences. It estimates rooted, time-measured phylogenies using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST 2 uses Markov chain Monte Carlo (MCMC) to average over tree space, so that each tree is weighted proportional to its posterior probability. BEAST 2 includes a graphical user-interface for setting up standard analyses and a suit of programs for analysing the results.

R. Bouckaert, T. G. Vaughan, J. Barido-Sottani, S. Duchêne, M. Fourment, A. Gavryushkina, J. Heled, G. Jones, D. Kühnert, N. D. Maio, M. Matschiner, F. K. Mendes, N. F. Müller, H. A. Ogilvie, L. du Plessis, A. Popinga, A. Rambaut, D. Rasmussen, I. Siveroni, M. A. Suchard, C. Wu, D. Xie, C. Zhang, T. Stadler, and A. J. Drummond, BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis, PLOS Comput Biol, vol. 15, iss. 4, p. e1006650, 2019.
Phylogeographic reconstruction using air transportation data Phylogeographic reconstruction using air transportation data can be used to study the global spread of the SARS-CoV-2 pandemic, especially in the early phases when air travel still substantially contributed to the spread of the virus. The method is currently adapted to consider both air travel and local movement data within countries during inference to reflect the changing worldwide movements in different phases of the pandemic.

S. Reimering, S. Muñoz, and A. C. McHardy, Phylogeographic reconstruction using air transportation data and its application to the 2009 H1N1 influenza A pandemic, PLoS Comput Biol, vol. 16, iss. 2, p. e1007101, 2020.

2. related DNAses (Yongjing)

3. RNA (Youhuang)

Resources

SARS-CoV-2 Sequencing Resources a crowd-sourced collection of information, documentation, protocols and other resources for public health laboratories intending to sequence SARS-CoV-2 coronavirus samples.
SARS-COV-2 (COVID-19) BIOINFORMATICS RESOURCES genexa provides resources useful for SARS-CoV-2 (COVID-19) genome index, unique kmers, and complete genome assemblies.
Bioinformatics resources for SARS-CoV-2 lists bioinformatics resources useful track the evolution and progression of the SARS-CoV-2 virus as well as to manage genomics data of the virus
Coronavirus Teck Handbook A collection of tools, websites and data relating to coronavirus.

tools & databases

COVID-19 Data Portal EMBL-EBI will bring together COVID-19 datasets that have been submitted to its public databases, including ENA, UniProt, PDBe, EMDB, Expression Atlas and Europe PMC. The data, which have so far been collated, include genes, protein structures, electron microscopy data and scientific publications.
UK's SARS-CoV-2 genome sequencing consortium
COVID-19 literature review a shiny app provides higher-quality peer-reviewed current literature on COVID-19.
Genome Detective Coronavirus Typing Tool a webserver to assemble all known virus genomes from next generation sequencing datasets
RNA Structural Covariation Above Phylogenetic Expectation R-scape looks for evidence of a conserved RNA structure by measuring pairwise covariations observed in an input multiple sequence alignment. It was applied to study the evoluiton of long noncoding RNAs.
RegRNA database. a regulatory RNA motifs identification tool.
BEAM web server a tool for structural RNA motif discovery.
RNA 3D Motif Atlas.
Rfam COVID-19 Resources Untranslated regions (UTR) are important functional elements that have conserved secondary structure and are responsible for multiple functions, including replication and packaging. Blog.

RNA motifs

The architecture of SARS-CoV-2 transcriptome a high-resolution map of the SARS-CoV-2 transcriptome and epitranscriptome. At least 41 sites displayed substantial differences (over 20% frequency), indicating potential RNA modifications, by comparing the ionic current (called “squiggles”) between negative control and viral transcripts. the most frequently observed motif is AAGAA. ‘AAGAA-like’ motif (including AAGAA and other A/G-rich sequences) are found throughout the viral genome but particularly enriched in genomic positions 28,500–29,500.
SECReTE cis-acting RNA elements the SARS-CoV-2 genome contains 40 SECReTE (secretion-enhancing cis regulatory targeting element) motifs at an abundance of ~1.3 SECReTEs/kilobase (kb). The motif is "NYN" or "NNY" (where N is any nucleotide and Y = U or C). Mutation of SECReTE motif enhances or inhibits mRNA stability and association with the ER. This motif appears to promote mRNA stability, localization to ER, and translation.
An in silico RNA folding map of SARS-CoV-2 The ScanFold program has been used to characterize its RNA folding landscape - highlighting regions of likely structure and function which serve as ideal targets for further analysis. The data is available here.
RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses Authors identified that 79 regions of length at least 15 nucleotides as exactly conserved over SARS-related complete genome. 106 ‘SARS-CoV-2-conserved-structured’ regions as potential targets for antivirals that bind to structured RNA.
Detection and sequence characterization of the 3'-end of coronavirus genomes harboring the highly conserved RNA motif s2m A remarkably conserved 43-nucleotide-long motif present at the 3'-end of the genomes of several members of the polyadenylated RNA virus families Astroviridae, Coronaviridae, and Picornaviridae can be used for the detection and sequence characterization of the viruses harboring it. The mobile genetic element s2m is the consensus sequence CGNGG(N)CCACGNNGNGT(N)ANNANCGAGGGT(N)ACAG. The function of s2m is unclear.

others

A widespread Xrn1-resistant RNA motif composed of two short hairpins The 3’ untranslated region of several beny-and cucumovirus RNAs harbors a so-called ‘coremin’ motif that is required for Xrn1 stalling. the minimal benyvirus stalling site consists of two hairpins of 3 and 4 base pairs respectively. The 5’ proximal hairpin requires a YGAD (Y = U/C, D = G/A/U) consensus loop sequence, whereas the 3′ proximal hairpin loop sequence is variable. The sequence of the 9-nucleotide spacer that separates the hairpins is highly conserved and potentially involved in tertiary interactions. A role for Xrn1 and the host decay machinery has only been shown for the SARS coronavirus nsp1. Severe acute respiratory syndrome coronavirus nsp1 protein suppresses host gene expression by promoting host mRNA degradation Expression of nsp1, the most N-terminal gene 1 protein, prevented Sendai virus-induced endogenous IFN-beta mRNA accumulation without inhibiting dimerization of IFN regulatory factor 3, a protein that is essential for activation of the IFN-beta promoter.
a conserved BH3-like sequence SARS-CoV E and SARS-CoV-2 E have a C-terminal BH3-like motif and a predicted interactome for E was identified.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SARS-CoV-2_Bioinformatics

1. genome/genomics (Guangyuan)

Data resources

Tools - Detection, Reconstruction, Identification

Phylogenetic Analysis

2. related DNAses (Yongjing)

3. RNA (Youhuang)

Resources

tools & databases

RNA motifs

others

4. interactions (Asif)

About

Releases

Packages

Contributors 2

License

biobai/SARS-CoV-2_Bioinformatics

Folders and files

Latest commit

History

Repository files navigation

SARS-CoV-2_Bioinformatics

1. genome/genomics (Guangyuan)

Data resources

Tools - Detection, Reconstruction, Identification

Phylogenetic Analysis

2. related DNAses (Yongjing)

3. RNA (Youhuang)

Resources

tools & databases

RNA motifs

others

4. interactions (Asif)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages