Sparse input data #102

ml3958 · 2021-09-21T14:56:24Z

Hi,

I have a sparse input matrix and I tried to do flashr (with non negative constrains on both F and L matrix), however I only get very few factors. I wonder is this has something to do with the sparsity in my input data (53% zeros), and what would be the best practice?

Thanks so much in advance!

stephens999 · 2021-09-21T16:25:58Z

First, for sparse matrices, I recommend our improved flashr at
https://github.com/willwerscheid/flashier
It should be faster for sparse matrices. It is also a bit more "in development" right now,
but we are actively working on it, so can provide advice etc there.

However, although in principle flashr and flashier can do EBMF with non-negative priors, i would not really recommend them (whether the data are sparse or not). In particular we know that convergence can be an issue with non-negative priors.

My recommendation, if this is a sparse count matrix and you want a non-negative factorization, would be to use Poisson non-negative matrix factorization, as in https://github.com/stephenslab/fastTopics , where we have worked much harder on good convergence, and also the count nature of the data is
better modeled.

We are also working on semi-nonnegative approaches in flashier (where loadings are non-negative
and factors are not) and these might also be of interest (but continue the conversation on that in the
flashier repo if you are interested)

ml3958 · 2021-09-21T21:50:37Z

Thanks so much for your reply. I will look into flashier.

In terms of what you mentioned:

"However, although in principle flashr and flashier can do EBMF with non-negative priors, i would not really recommend them (whether the data are sparse or not). In particular we know that convergence can be an issue with non-negative priors."

I wonder would such difficulty in convergence lead to less factors than expected? I ran flashr with nonnegative priors multiple times the results are pretty consistent (few factors, but very reproducible)- so I suspect in my case convergence was not an issue. However, I do have a very complex dataset and I expect many factors.

I didn't consider fastTopics because my input is not a count readout, but rather normalized statistics.

Thank you!

stephens999 · 2021-09-21T23:52:03Z

yes, bottom line is that convergence difficulties could lead to underfitting of the right number of factors. Maybe try for comparison another package for Non-negative matrix factorization like nnlm?

…

On Tue, Sep 21, 2021 at 4:50 PM Menghan Liu ***@***.***> wrote: Thanks so much for your reply. I will look into *flashier*. In terms of what you mentioned: "However, although in principle flashr and flashier can do EBMF with non-negative priors, i would not really recommend them (whether the data are sparse or not). In particular we know that convergence can be an issue with non-negative priors." I wonder would such difficulty in convergence lead to less factors than expected? I ran flash with nonnegative priors multiple times the results are pretty consistent (few factors, but very reproducible)- so I suspect in my case convergence was not an issue. However, I do have a very complex dataset and I expect many factors. I didn't consider fastTopics because my input is not a count readout, but rather normalized statistics. Thank you! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#102 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANXRRLP2D6RUKI3KOEHHHLUDD43PANCNFSM5EOVWIYA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse input data #102

Sparse input data #102

ml3958 commented Sep 21, 2021 •

edited

Loading

stephens999 commented Sep 21, 2021 •

edited

Loading

ml3958 commented Sep 21, 2021 •

edited

Loading

stephens999 commented Sep 21, 2021 via email

Sparse input data #102

Sparse input data #102

Comments

ml3958 commented Sep 21, 2021 • edited Loading

stephens999 commented Sep 21, 2021 • edited Loading

ml3958 commented Sep 21, 2021 • edited Loading

stephens999 commented Sep 21, 2021 via email

ml3958 commented Sep 21, 2021 •

edited

Loading

stephens999 commented Sep 21, 2021 •

edited

Loading

ml3958 commented Sep 21, 2021 •

edited

Loading