Generalized Matrix Decomposition Framework (GMDF)

GMDF is a method for unsupervised meta-analysis of diverse datasets, including single-cell genomics data.

Guidelines:

See GMDF_stepByStepExample.R script to set up the environment, generate pan-cancer shared programs (using N1 = 25), and run a toy GMDF example.

To reproduce the CD8 pan-cancer programs: source(“GMDF_wrapper.R") rslts<-GMDF_combine_pancancer()

To use GMDF for other datasets: rslts<-GMDF_wrapper(E, a, k, k1,N1 ,outputdir)

Annotations:

g = number of genes
k = number of shared programs
k1 = number of context-specific programs
n = number of datasets
m = number of contexts (e.g, cancer types)

Input:

E: list of n gene expression matrices, one per dataset/condition.
a: n x m matrix denoting the value of m annotations (i.e., contexts) in n datasets provided in E (see examples below).**
k: Initial estimate of the number of shared programs.
K1: Initial estimate of number of context-specific programs per context.
N1: Number of times to run GMDF. If N1 > 1 then multiple solutions will be combined.
outputdir: The output directory to save the single GMDF run results. Required if N1 > 1.

Output

If N1 > 1, the output will be the result of N1 combined GMDF runs:

sig – the final shared signatures (top 100 genes in the programs represented in the Wf matrix).
Wf – the final GMDF shared programs based on all the GMDF runs, such that Wf[i,j] denotes the weight of gene i in program j.
W.multi.run – weight of genes (rows) in each of the shared programs identified in the single GMDF runs (columns)
W.clusters – the clustering annotation of each one of the shared GMDF programs from the single runs

If N1 = 1, the output will be the result of a single GMDF run: The decomposition of the data Et[[i]] ~ (Hw[[i]]) x W + ∑j(a[i,j] x H[[j]][[i]] x A[[j]]) Where Et is transpose(E[[i]][g,]). For additional information about the output and its interpretation see Figure 1A and equation (1) as provided in the STAR METHODS describing GMDF.

Hw - shared programs usage per dataset: list of n matrix, one per dataset, each of size g x k
W - g x k matrix representing the shared programs.
A - context-specific programs, k1 x g matrix per each of the m contexts
H[[j]] - context-specific program usage per context j, with a (g x k1) matrix per each of the n datasets
obj - the final value of the objective function, minimizing the reconstruction error
objAA - the values of the objective function in each iteration; NA in case the run terminated before the 100th iteration.
param - the parameters that were provided as input and used
input.data - the input object

Dependencies: plyr, rliger.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
GMDF.R		GMDF.R
GMDF_W_pancanCD8.rds		GMDF_W_pancanCD8.rds
GMDF_stepByStepExample.R		GMDF_stepByStepExample.R
GMDF_wrapper.R		GMDF_wrapper.R
README.md		README.md
ToyExample.rds		ToyExample.rds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generalized Matrix Decomposition Framework (GMDF)

About

Releases

Packages

Languages

livnatje/GMDF

Folders and files

Latest commit

History

Repository files navigation

Generalized Matrix Decomposition Framework (GMDF)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages