big data set and K #8

yuGithuuub · 2020-05-19T07:19:25Z

Hey ALRA team,
I would like to ask alra's performance on very large data sets（～600k cell)
I am using scapy pipeline and I have 2 quertions:

I noticed that the excessively large value of k in your article seems to have little effect on the results. Is it appropriate to use the default parameter of k = 50？
2.I found that after subsetting the data, I found that it seemed to perform better.Is this related to the k value?
By the way , alra provides the best experience in certain aspects !^_^
Looking forward to your reply

JunZhao1990 · 2021-01-12T22:34:35Z

Thanks for your interest in ALRA! And sorry for the very late response.
To better understand your question, could you provide the estimated k values by ALRA for the whole data and the subset data? You could run the choose_k() function in the ALRA code to find the estimated k.

ghost · 2024-03-05T00:10:40Z

@yuGithuuub
Hello, ANA111. I, too, work with large datasets in my analyses. I've encountered an issue related to sparseMatrix. Have you faced a similar challenge by any chance?

[Error occurred]
Error in .m2sparse(from, paste0(kind, "g", repr), NULL, NULL):
attempt to construct sparseMatrix with more than 2^31-1 nonzero entries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

big data set and K #8

big data set and K #8

yuGithuuub commented May 19, 2020

JunZhao1990 commented Jan 12, 2021 •

edited

Loading

ghost commented Mar 5, 2024

big data set and K #8

big data set and K #8

Comments

yuGithuuub commented May 19, 2020

JunZhao1990 commented Jan 12, 2021 • edited Loading

ghost commented Mar 5, 2024

JunZhao1990 commented Jan 12, 2021 •

edited

Loading