Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diff_mean_test() and group sizes #98

Open
jhb1980 opened this issue Mar 18, 2021 · 2 comments
Open

diff_mean_test() and group sizes #98

jhb1980 opened this issue Mar 18, 2021 · 2 comments

Comments

@jhb1980
Copy link

jhb1980 commented Mar 18, 2021

Hi Christoph,

absolutely phenomenal tool set, I'm really excited by the possibility to run DE testing on the output of sctransform! I was wondering if you could comment on what a sufficient minimal number of cells would ideally be for "robust" DE calling between two groups when working with the implementation of diff_mean_test()?

@ChristophH
Copy link
Collaborator

That's a good question and I've always wanted to formally test this. More cells is always better, but a more nuanced answer takes into account two more aspects

  1. The expression level of the gene, i.e. mean UMI counts in group 1
  2. The fold change, i.e. log2(mean_in_group1/mean_in_group2)

I've run simulations to get a better idea of how exactly these two factors affect the number of cells needed. This figure sums up the results:

image

For example, to detect a log2FC of 2 for a gene with mean 0.1 (so going from 0.1 to 0.4, bottom row, third panel), you would need about 100 cells per group. A decrease (negative log2FC, panel above) would be much harder to detect (ca. 80% recovery with 200 cells per group when going from 0.1 to 0.025)

On the other hand, if a gene is absent from one group and then goes up to medium-high (say from 0.001 to 1) even 20 cells will be sufficient.

Notes regarding these results

  • No p-value correction for multiple testing
  • Not looking at false positives - this is not telling us anything about specificity overall
  • Assuming balanced group sizes
  • Assuming corrected counts are used, i.e. no additional variability in the counts due to sequencing depth

R notebook

@jhb1980
Copy link
Author

jhb1980 commented Mar 19, 2021

Brilliant, thank you Christoph!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants