Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

norm of the discrepancy between the computed approximation UV ∗ and the matrix A being decomposed #4

Open
Tagar opened this issue Jan 29, 2019 · 2 comments

Comments

@Tagar
Copy link

Tagar commented Jan 29, 2019

Facebook did a great job on comparing their own distributed SVD implementation to stock Spark ML one.

http://tygert.com/spark.pdf

See for example this table -

image

Do you have an estimate for norm of the discrepancy between the computed approximation UV ∗ and the matrix A being decomposed?

Thank you.

@Ivanopolo
Copy link
Contributor

Hi @Tagar! It is a very good question. We have unit-tests that validate the quality of reconstruction on toy matrices, but we do not do that for the full-scale matrice since it is not trivial (to compute it to the full matrix, you need O(m x n) operations which is usually not feasible unless you use some sort of subsampling). Internally, we use other metrics to validate the quality of extracted eigenvectors (basically, we try them for our task related to recommendation) and it satisfies us.
But we would love to see some contributions related to this! Any external evaluation or a method to efficiently compute these metrics would be very much welcome. If you need any help/advice on how to better do that, just let me know.

@tygert
Copy link

tygert commented Jan 31, 2019

@Ivanopolo, @Tagar is right that the accuracy of the randomized methods can be extremely finicky for large-scale matrices. http://tygert.com/spark.pdf uses the power method to ascertain the accuracy. Attaining good accuracy for nearly rank-deficient matrices (which are the rule rather than the exception in our experience at Facebook) requires care with the distributed QR. Algorithms which produce (in fact, guarantee) good accuracy are available at http://tygert.com/spark.pdf with implementations at http://github.com/hl475/svd and expository prototypes in Python at http://tygert.com/valid.tar.gz ... Thanks for the undeserved compliment, @Tagar, and for looking out for all of us in the open-source community! Facebook hopes these methods and the software (written at Yale by @hl475 before he joined Facebook) can be valuable to you. P.S. Some systems at Facebook retrofit existing codes for alternating least squares via the trick elaborated at http://tygert.com/als.pdf (admittedly inelegant, but accurate and convenient if the approach at http://tygert.com/spark.pdf would require too much work to implement).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants