Commands and codes used for the RNASeq and Proteomics data analysis SARS-Cov2 infected samples
The project aims to identify host cellular response against SARS-CoV2 infection in Huh7 cell lines. A time-series integrative proteo-transcriptomics analysis design was employed in the project. The Huh7 cells were collected at 24, 48, 72 hours post infection and RNASeq & Proteomics data were generated. Also an uninfected sample was included which used as a control. There were three replicates in the design. Proteomics and transcriptomics data were analysed using R package LIMMA, both for pairwise and time series analysis.
Inputs :
-
Proteomics : Proteomics raw data with uniprot accession as rows and samples as columns. Annotation file with as columns 1) Uniprot Accession 2) Gene name.
-
Transcriptomics : Transcriptomics count data with Ensembl ID as rows and samples as columns.
Outputs :
- PCA, Distribution after normalization
- Up or down regulated proteins & transcripts (time series analysis)
- Up or down regulated proteins & transcripts (pair-wise analysis)
- Count missing values / PCA
- Normalization
- Plotting Distribution / Dimension reduction : PCA 2
- Calculation of Normalization factors
- Filtering of low abundant genes
- Voom transformation
- LIMMA : 6 pair-wise comparisons (UNINF/T24 / T48 / T72)
- LIMMA : time series analysis (T24, T48, T72)
- Pathway enrichment heatmap (All analysis)
- Sankey plot (top 4 pathways)
- Scatter plots (top 4 pathways)
git clone https://github.com/neogilab/COVID19.git
cd COVID19/
All the analysis were performed in Linux-ubuntu environment.
- Python3.5+ (Please check https://github.com/zqfang/gseapy for GSEAPY requirements.)
- R packages (Analysis and figure generation) :
Rscript source/requierements.R
# This command will install the following packages:
# NormalyzerDE==1.4.0
# ggfortify==0.4.9
# xlsx==0.6.3
# gplots==3.0.3
# edgeR==3.28.1
# limma==3.42.2
# matrixStats==0.56.0
# reshape2==1.4.3
# ggalluvial==0.11.1
# ggplot2==3.3.0
# dplyr==0.8.5
All data should be placed in a folder called data (transcriptomics data must be in a file named Transcriptomics and proteomics data in a file named Proteomics).
Rscript -e "rmarkdown::render('covid-19_proteomics_preprocessing.Rmd')"
Rscript -e "rmarkdown::render('cov-19_time_series_proteomics_antiviral.Rmd')"
Rscript -e "rmarkdown::render('covid-19_transcriptomics_pairwise_comp.Rmd')"
Rscript -e "rmarkdown::render('cov-19_time_series_transcriptomics_antiviral.Rmd')"
Rscript -e "rmarkdown::render('covid-19_transcriptomics_pairwise_comp.Rmd')"
python3 GSEA.py
Rscript -e "rmarkdown::render('covid-19_figures_heatmap.Rmd')"
Rscript -e "rmarkdown::render('covid-19_figures_top_4_pathways.Rmd')"
Code used for network analysis of proteomics and transcriptomics analysis is included in the folder "network_analysis"