- pISA-tree on GitHub
- Petek, M., Zagorščak, M., Blejec, A. et al. pISA-tree - a data management framework for life science research projects using a standardised directory tree. Sci Data 9, 685 (2022). https://doi.org/10.1038/s41597-022-01805-5
- one or multiple genotypes
- under single and multiple abiotic/biotic stressors
- experiment duration: XY hours, days, ... : time-series experimental design
- tissue: single or multiple
- Omics' strategies:
- Hormonomics
- Transcriptomics
- Proteomics
- Metabolomics
- Phenomics
- Design Phenodata, a master experimental design table describing samples for analysis, prior to sample collection according to good data management practice
- Data preprocessing and overall inspection
-
detection of outliers and faulty measurements
-
data transformation (if needed)
-
interpolation
- point-to-point
- approxfun {stats} - Returns a list of points which linearly interpolate given data points, or a function performing the linear (or constant) interpolation.
- polynomial
- predict {stats} - A generic function for predictions from the results of various model fitting functions
- boxplot.stats {grDevices} - Box Plot Statistics
- mad {BiocGenerics} - Compute the median absolute deviation for a vector
- aq.plot {mvoutlier} - Adjusted quantile plots for multivariate outlier detection
- point-to-point
-
extrapolation
-
imputation
- for qPCR see Baebler, Š., Svalina, M., Petek, M. et al. quantGenius: implementation of a decision support system for qPCR-based gene quantification. BMC Bioinformatics 18, 276 (2017). https://doi.org/10.1186/s12859-017-1688-7
-
Statistical analysis of individual omics data layers
- ggplot {ggplot2} - various plots, https://r-graphics.org/chapter-ggplot2
- corr.test {psych} - Find the correlations, sample sizes, and probability values between elements of a matrix or data.frame
- cor.plot {psych} - Create an image plot for a correlation or factor matrix
- pairs.panels {psych} - SPLOM, histograms and correlations for a data matrix
- rcorr {Hmisc} - Matrix of Correlations and P-values
- heatmaply_cor {heatmaply} - Cluster heatmap based on plotly
- corrplot {corrplot} - A visualization of a correlation matrix
- pheatmap {pheatmap} - A function to draw clustered heatmaps
- t_test {rstatix}
- ggdotplot, ggviolin {ggpubr}
- metaMDS {vegan} - Nonmetric Multidimensional Scaling with Stable Solution from Random Starts, Axis Scaling and Species Scores
- {limma} for e.g. non-targeted Proteomics, RNA-seq, ..
- limma::lmFit - Linear Model for Series of Arrays
- limma::makeContrasts - Construct Matrix of Custom Contrasts
- limma::contrasts.fit - Compute Contrasts from Linear Model Fit
- limma::eBayes - Empirical Bayes Statistics for Differential Expression
- limma::decideTests - Multiple Testing Across Genes and Contrasts
- limma::topTable - Table of Top Genes from Linear Model Fit
-
Correlation based network inference within each omics level
Since results from both methods heavily depend on selected thresholds, Lioness node and edge selection using FDR being even more sensitive on correlation difference cut-off, we suggest to use an automated graph thresholding approach.
-
Integration across different omics datasets
- Canonical Correlation Analysis
- N-Integration Discriminant Analysis with DIABLO
- {mixOmics}
- block.splsda {mixOmics} N-integration and feature selection with Projection to Latent Structures models (PLS) with sparse Discriminant Analysis
- plotDiablo {mixOmics} Graphical output for the DIABLO framework
- plotVar {mixOmics} Plot of Variables
- plotIndiv {mixOmics} Plot of Individuals (Experimental Units)
- plotArrow {mixOmics} Arrow sample plot
- circosPlot {mixOmics} circosPlot for DIABLO
- cimDiablo {mixOmics} Clustered Image Maps (CIMs) ("heat maps") for DIABLO
- network {mixOmics} Relevance Network for (r)CCA and (s)PLS regression
- {mixOmics}
- Leave-One-Out graphs
- Integration of data with prior knowledge
- Data is expected to be arranged within data management framework, with complete and descriptive metadata files, including Phenodata file.
- 'Omics files are expected to be preprocessed (see suggestions in Step 2).
- Minimal input files can be found within './input' directory.
- For Step 3: Statistical analysis of individual omics data layers run script 01_Step3.Rmd
- For Step 4: Correlation based network inference within/between each omics level run script 02_Step4.Rmd
- For Step 5: Integration across different omics datasets run script 03_Step5.Rnw
For more info see multiOmics_data_analysis_Protocol