Skip to content

NIB-SI/multiOmics-integration

Repository files navigation

‘multiOmics data analysis, integration, and visualisation protocol

  • Data integration and modelling with R
  • Systems biology analysis and visualization pipeline in R

Data management framework

Expected measurements

  • one or multiple genotypes
  • under single and multiple abiotic/biotic stressors
  • experiment duration: XY hours, days, ... : time-series experimental design
  • tissue: single or multiple
  • Omics' strategies:
    • Hormonomics
    • Transcriptomics
    • Proteomics
    • Metabolomics
    • Phenomics

Analysis steps:

  1. Design Phenodata, a master experimental design table describing samples for analysis, prior to sample collection according to good data management practice
  2. Data preprocessing and overall inspection
  1. Statistical analysis of individual omics data layers

    • ggplot {ggplot2} - various plots, https://r-graphics.org/chapter-ggplot2
    • corr.test {psych} - Find the correlations, sample sizes, and probability values between elements of a matrix or data.frame
    • cor.plot {psych} - Create an image plot for a correlation or factor matrix
    • pairs.panels {psych} - SPLOM, histograms and correlations for a data matrix
    • rcorr {Hmisc} - Matrix of Correlations and P-values
    • heatmaply_cor {heatmaply} - Cluster heatmap based on plotly
    • corrplot {corrplot} - A visualization of a correlation matrix
    • pheatmap {pheatmap} - A function to draw clustered heatmaps
    • t_test {rstatix}
    • ggdotplot, ggviolin {ggpubr}
    • metaMDS {vegan} - Nonmetric Multidimensional Scaling with Stable Solution from Random Starts, Axis Scaling and Species Scores
    • {limma} for e.g. non-targeted Proteomics, RNA-seq, ..
      • limma::lmFit - Linear Model for Series of Arrays
      • limma::makeContrasts - Construct Matrix of Custom Contrasts
      • limma::contrasts.fit - Compute Contrasts from Linear Model Fit
      • limma::eBayes - Empirical Bayes Statistics for Differential Expression
      • limma::decideTests - Multiple Testing Across Genes and Contrasts
      • limma::topTable - Table of Top Genes from Linear Model Fit
  2. Correlation based network inference within each omics level

    Since results from both methods heavily depend on selected thresholds, Lioness node and edge selection using FDR being even more sensitive on correlation difference cut-off, we suggest to use an automated graph thresholding approach.

  3. Integration across different omics datasets

  • Canonical Correlation Analysis
  • N-Integration Discriminant Analysis with DIABLO
    • {mixOmics}
      • block.splsda {mixOmics} N-integration and feature selection with Projection to Latent Structures models (PLS) with sparse Discriminant Analysis
      • plotDiablo {mixOmics} Graphical output for the DIABLO framework
      • plotVar {mixOmics} Plot of Variables
      • plotIndiv {mixOmics} Plot of Individuals (Experimental Units)
      • plotArrow {mixOmics} Arrow sample plot
      • circosPlot {mixOmics} circosPlot for DIABLO
      • cimDiablo {mixOmics} Clustered Image Maps (CIMs) ("heat maps") for DIABLO
      • network {mixOmics} Relevance Network for (r)CCA and (s)PLS regression
  • Leave-One-Out graphs
  1. Integration of data with prior knowledge

Start of the analysis:

  • Data is expected to be arranged within data management framework, with complete and descriptive metadata files, including Phenodata file.
  • 'Omics files are expected to be preprocessed (see suggestions in Step 2).
  • Minimal input files can be found within './input' directory.
  • For Step 3: Statistical analysis of individual omics data layers run script 01_Step3.Rmd
  • For Step 4: Correlation based network inference within/between each omics level run script 02_Step4.Rmd
  • For Step 5: Integration across different omics datasets run script 03_Step5.Rnw

For more info see multiOmics_data_analysis_Protocol