This project aims to compare how effectively Scala and Python implementations of an ALS movie recommender can be accelerated using GPUs with Spark on a cloud-computing platform. We use Spark MLlib to build the Python and Scala recommenders, and we use the NVIDIA spark-rapids package to integrate an AWS EMR cluster with GPUs. We also compare the speedup between a cluster utilizing GPUs and one with only CPUs. Lastly, we compare how well the equivalent Scala and Python implementations perform on the MovieLens datasets of 100k movie ratings, 1M movie ratings, and 20M movie ratings and 25M movie ratings to measure weak scaling.
forked from benjaminliu1998/CS205_Recommender
-
Notifications
You must be signed in to change notification settings - Fork 0
connorcapitolo/CS205_Recommender
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Python vs Scala: Comparing speed, scalability and performance for a distributed movie recommender using ALS
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Scala 74.8%
- Python 25.2%