Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse input data #102

Open
ml3958 opened this issue Sep 21, 2021 · 3 comments
Open

Sparse input data #102

ml3958 opened this issue Sep 21, 2021 · 3 comments

Comments

@ml3958
Copy link

ml3958 commented Sep 21, 2021

Hi,

I have a sparse input matrix and I tried to do flashr (with non negative constrains on both F and L matrix), however I only get very few factors. I wonder is this has something to do with the sparsity in my input data (53% zeros), and what would be the best practice?

Thanks so much in advance!

@stephens999
Copy link
Contributor

stephens999 commented Sep 21, 2021

First, for sparse matrices, I recommend our improved flashr at
https://github.com/willwerscheid/flashier
It should be faster for sparse matrices. It is also a bit more "in development" right now,
but we are actively working on it, so can provide advice etc there.

However, although in principle flashr and flashier can do EBMF with non-negative priors, i would not really recommend them (whether the data are sparse or not). In particular we know that convergence can be an issue with non-negative priors.

My recommendation, if this is a sparse count matrix and you want a non-negative factorization, would be to use Poisson non-negative matrix factorization, as in https://github.com/stephenslab/fastTopics , where we have worked much harder on good convergence, and also the count nature of the data is
better modeled.

We are also working on semi-nonnegative approaches in flashier (where loadings are non-negative
and factors are not) and these might also be of interest (but continue the conversation on that in the
flashier repo if you are interested)

@ml3958
Copy link
Author

ml3958 commented Sep 21, 2021

Thanks so much for your reply. I will look into flashier.

In terms of what you mentioned:

"However, although in principle flashr and flashier can do EBMF with non-negative priors, i would not really recommend them (whether the data are sparse or not). In particular we know that convergence can be an issue with non-negative priors."

I wonder would such difficulty in convergence lead to less factors than expected? I ran flashr with nonnegative priors multiple times the results are pretty consistent (few factors, but very reproducible)- so I suspect in my case convergence was not an issue. However, I do have a very complex dataset and I expect many factors.

I didn't consider fastTopics because my input is not a count readout, but rather normalized statistics.

Thank you!

@stephens999
Copy link
Contributor

stephens999 commented Sep 21, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants