Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the number of coefficients (proportions of cell types) in extended model #3

Open
gurkanbal opened this issue Jun 1, 2017 · 4 comments

Comments

@gurkanbal
Copy link

gurkanbal commented Jun 1, 2017

Hi,

Bseq_Sc is a great tool. I like it!

However, there is a problem with fitEdgeR, if I want to perform analysis using five or more cell types, fitEdgeR return following error;

"Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05, :
Design matrix not of full rank. The following coefficients not estimable:
Microglia # (the last of coefficients )"

Nevertheless, its work pretty good with four or less cell types. Is there any bug? Do you have any advice for the analysis with five or more cell types?

Best
Gürkan

@gurkanbal gurkanbal changed the title the number of coveriance (proportions of cell types) in extended model the number of coefficients (proportions of cell types) in extended model Jun 3, 2017
@renozao
Copy link

renozao commented Jun 6, 2017

The limit in the number of coefficients you can estimate is driven by your sample size. You typically need at least n+2 samples to estimate n coefficients (counting all coefficients in the model: intercept, covariates, cell types, group of interest).

How many samples do you have (number of columns in eset) and how many coefficients in the model?

@gurkanbal
Copy link
Author

eset contains 156 samples (number of columns in eset), and model contains 2 covariates, 5 cell types and group of interest.

its looks like as follow ;

"fit_edger_ext <- fitEdgeR(eset, ~ Gender + ApoE + oligodendrocytes + astrocytes + microglia + neurons + endothelial + diagnosis_class, coef = 'diagnosis_classDisease_AD')"

this command returned following Error message,

Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05, :
Design matrix not of full rank. The following coefficients not estimable:
endothelial

But if any one of the cell types was removed, and the model run using 4 cell types, in this case it works.

@renozao
Copy link

renozao commented Jun 12, 2017

The cell type proportions probably sum up to one within each sample, which makes them together collinear with the intercept in the model. This is why removing any one of them makes the model completely estimable.

I would try only correcting for cell types that either show significant differences between the diagnosis_class groups or are very dominant relatively to the other cell types.

@charlesgwellem
Copy link

If you proceeded with this tutorial till the end, can you please share with me how you prepared the expression set?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants