Allows large data? #5

ozgunakalin · 2016-12-07T11:16:26Z

Hello

I currently use sklearn's TSNE, and it is not very memory friendly. I wonder how this project compares to that one in terms of the rows in the data it can handle. Thanks.

saurfang · 2017-01-15T02:49:26Z

That was the hope then I found that I need a scalable knn implementation, which distracted me to work on https://github.com/saurfang/spark-knn. Unfortunately I no longer have time pursuing this project. However I am happy to answer any questions or review any contributions.

kartha01 · 2017-06-26T15:14:38Z

Curious, has anyone been able to run this on large datasets. Was wondering what are the issues
you ran into and approx run times

I am using a 3GB dataset with 100 features and so far had to update the following properties with new
values:

"spark.rpc.askTimeout=1000"
"spark.akka.frameSize=256"
"spark.driver.maxResultSize=2G"

to fix the exceptions, I ran into. Also, the driver and executor needs to have lots of memory, I am using 10G for each (with 12 executors) and the t-SNE is still running after about 14 hrs...

I am using the same approach as shown in the MNIST.scala example:
com/github/saurfang/spark/tsne/examples/MNIST.scala

Any thoughts/ideas on speeding this up....

Regards,
Rajesh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allows large data? #5

Allows large data? #5

ozgunakalin commented Dec 7, 2016

saurfang commented Jan 15, 2017

kartha01 commented Jun 26, 2017 •

edited

Loading

Allows large data? #5

Allows large data? #5

Comments

ozgunakalin commented Dec 7, 2016

saurfang commented Jan 15, 2017

kartha01 commented Jun 26, 2017 • edited Loading

kartha01 commented Jun 26, 2017 •

edited

Loading