Motif-Informed Network Inference based on single-cell EXpression data
The pipeline is built using Nextflow DSL2 and has the purpose of infer cell-type specific gene regulatory network using scRNA-Seq data in plants.
MINI-EX uses a dual license to offer the distribution of the software under a proprietary model as well as an open source model.
MINI-EX v2.* is released! Main features:
- Added support for Solanum lycopersicum (tomato)
- Added support for maize AGPv5
- It is now possible to omit the motif enrichment analysis, enabling MINI-EX to run on all possible species (Please note: in this mode, resulting networks are more susceptible to false positives)
- Introducing a new output format: edge table with regulon ranks and edge weights
- Added regulator heatmap as additional output figure (example)
- The complete list of new features can be found in the release notes for v2.0 and v2.1
1. Run expression-based gene regulatory network (GRN) inference (GRNBoost2) given a list of transcription factors (TFs) and a gene-to-cell count matrix
2. Run TF binding site (TFBS) enrichment on the expression-based regulons and filter for TF or TF-Family motifs (default TF-Family)
3a. Filter the previously identified regulons by target genes' (TGs) expression among the defined cell clusters (cluster specificity) using the provided markers
3b. Filter the cell cluster specific regulons by TF expression
4a. Calculate network statistics (out-degree, betweenness, closeness), cluster specificity and functional (GO) enrichment of the target genes of each regulon (if a list of GO terms is provided)
4b. Generate a list of ranked regulons based on Borda ranking and generate an edge table containing edge scores
For the last step, if a list of GO terms of interest is provided:
- First all the combinations of weighted metrics (network statistics, cluster specificity and functional enrichment) are evaluated
- The combination which returns half of the expected regulons earlier in the ranks (R50) is chosen for the weighted Borda ranking
else:
- The network statistics and cluster specificity are used to calculate the Borda ranking (calculated on the geometric mean of the single metrics)
Note: step 2 can be omitted when no motif mapping data is available (motif mapping data is provided for Arabidopsis, rice and maize). However, use with caution as without motif data the networks will be less precise.
- Gene-to-cell count matrix (genes as rows and cells as columns)
- List of TFs
- Seurat output from FindAllMarkers
- Tab-separated file containing the cluster identity of each cell (cell_barcode \t cluster_id)
- Tab-separated file containing the cluster annotation (cluster_id \t cluster_annotation)
- (Optional) List of GO terms of interest
As the pipeline can be run in parallel for multiple datasets all the inputs can be provided as a path to the dedicated directories.
All input files should have specific extensions and names as shown in here.
- regulons_output folder containing a tab-separated files with the inferred regulons, an edge table, info per TF, and an excel file with the ranked regulons and relative metadata
- figures folder containing a clustermap reporting the distribution of the different regulons across the cell clusters, and two heatmaps showing the cell cluster specificity and DE calls of the top 150 regulons, respectively.
- GOenrichment_output folder containing a tab-separated file with GO enrichment for the different regulons with relative statistics
- GRNBoost2_output folder containing a TF-TG tab-separated file resulted from the GRNBoost2 run
A detailed overview on necessary input files and expected output files can be found here.
Define paths in the config file to all the required inputs.
nextflow -C miniex.config run miniex.nf
Having problems running MINI-EX? Check the FAQ.
Should you have any questions or suggestions, please send an e-mail to [email protected].
Should you encounter a bug, please open an issue.
When publishing results generated using MINI-EX, please cite:
Ferrari C, Manosalva Pérez N, Vandepoele K. MINI-EX: Integrative inference of single-cell gene regulatory networks in plants. Mol Plant. 2022 Nov 7;15(11):1807-1824. doi: 10.1016/j.molp.2022.10.016. Epub 2022 Oct 27. PMID: 36307979.