This repository contains gene features (gene x feature matrices) used in the Finucane lab, primarily for gene prioritization methods.
Most raw data is in the form of (cell x gene) or (tissue x gene) count or TPM matrices. Experiments were performed either at the bulk or single cell level, and in some cases are single cells are merged before we derive features. Raw data will in many cases not be available on the public repository due to either total size or because it is currently unpublished and we do not have permissions to share. Please update this list everytime a new dataset is added.
- human_multiple from the GTEx consortium (v8)
- human_immune from the Human Cell Atlas
- human_heme from Corces et al.
- human_pbmc from Zheng et al.
- human_pancreas from Murano et al.
- human_gut from Smilie _et _al.
- human_brain from Lake et al.
- human_brain2 from Miller et al.
- mouse_immume from Yoshia et al.
- mouse_brain from Saunders et al.
- mouse_aorta from Kalluri et al.
- mouse_islets from Sharon et al.
- mouse_digestive from Gao et al.
- mouse_heart from Hu et al.
- human_colon from Kinchen et al.
- mouse_multiple from Han et al.
- mouse_lung from Cohen et al.
- mouse_thymus from Kernfeld et al.
- human_placenta from Liu et al.
- human_brain3 from Li et al.
- mouse_microglia from Hammond et al.
- mouse_gutendoderm from Nowotschin et al.
- mouse_development from Cao et al.
- human_coloncancer from Zhang et al.
- mouse_nerve from Wolbert et al.
- mouse_hemogenicendothelium from Zhu et al.
- human_retina from Lu et al.
- human_pancreasductal from Qadir et al.
- mouse_hairfollicle from Shin et al.
- mouse_adipocyte from Zhong et al.
- mouse_muscle from Micheli et al.
- mouse_airway from Miller et al.
- human_thymus from Park et al.
- human_colon2 from James et al.
- mouse_endothelium from Kalucka et al.
- mouse_vagina from Ali et al.
- human_eye from Orozco et al.
- human_hippocampus from Zhong et al.
- human_muscle from Rubenstein et al.
- human_intestine from Wang et al.
- human_kidney from Liao et al.
- human_monocytes from Villani et al.
- human_liver from Dobie et al.
- human_fetalblood from Popescu et al.
- human_bonemarrow from Oetjen et al.
- mouse_brain2 from Rosenberg et al.
mouse_brain3 from Zeisel et al.- mouse_brain4 from Welch et al.
- human_brain4 from Welch et al.
mouse_brain5 from Kim et al.- mouse_kidney from Rasnick et al.
- human_nk from Ferrari de Andrade et al.
- human_retina2a from Menon et al.
- human_kidney2 from Stewart et al.
- human_tcell from Szabo et al.
- human_bladder from Yu et al.
- human_kidney3 from Wilson et al.
- human_embryo from Zhou et al.
- mouse_epithelium from Sharir et al.
- human_lymphnodes from Takeda et al.
- mouse_multiple2 from Pisco et al.
- human_lung from Adams et al.
- mouse_gastrulation from Pijuan-Sala et al.
- human_prostate from Henry et al.
- human_ileum from Martin et al.
- human_airway from Deprez et al.
- human_csf from Schafflick et al.
- human_testis from Hermann et al.
- human_synovialfibroblast from Wei et al.
- mouse_muscle2 from De Micheli et al.
We derive features underlying tisse type / cellular processes using:
- Features that explain the most variance across (and within) cell populations (gene loadings on top X PCs, gene loading on top X ICs, within cluster PCs and ICs)
- Features that represent genes that define predefined cell populations or identified clusters (one vs. all differential gene expression -- t-stat, DE genes)
- Features that represent expression programs shared across cell types (TBD, co-expression gene modules)
Features can be found here.
(Need to update) Run install.R to install all necessary packages. Working code for each type of data is named similarly and lives in code.
-
Read in, QC, filter, scale, and normalize data (i.e. plots/human_immune/variablegenes.pdf)
-
Perform PCA and ICA across all cells (where available) or meta-cells or tissues (i.e. plots/human_immune/pcaelbow.pdf)
-
Perform clustering and UMAP and plot features on projection (i.e. plots/human_immune/umap_clusters.pdf
-
Perform differential expression analysis (i.e. plots/human_immune/umap_degenes.pdf
Current output features:
- Unweighted gene loadings from PCA (i.e. features/human_immune/projected_pcaloadings.txt.gz)
- Unweighted gene loadings from ICA (i.e. features/human_immune/projected_icaloadings.txt.gz)
- Unweighted gene loadings from PCA within cluster (i.e. features/human_immune/projected_pcaloadings_clusters.txt.gz)
- Average expression across clusters (pre-defined and identified) (i.e. features/human_immune/average_expression.txt)
- One-vs-all t-test statistic (pre-defined and identified) (i.e. features/human_immune/diffexprs_tstat_clusters.txt)
- Differentially expressed genes across clusters (pre-defined and identified) (i.e. features/human_immune/diffexprs_genes_clusters.txt)
Next steps:
- Gene modules (WCGNA)
- Co-expression analysis