Skip to content
/ sPCA Public
forked from qcri/sPCA

Scalable PCA (sPCA) is a scalable implementation of Principal component analysis algorithm on top of Spark

License

Notifications You must be signed in to change notification settings

tgamal/sPCA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

sPCA

Scalable PCA (sPCA) is a scalable implementation of Principal component analysis (PCA) on top of Spark and MapReduce. sPCA achieves scalability via employing efficient large matrix operations, effectively leveraging matrix sparsity, and minimizing intermediate data. The repository contains two README files that will take you through running sPCA on Spark and MapReduce, respectively: (sPCA-Spark README, sPCA-mapreduce README).

People

Publications

  • T. Elgamal, M. Yabandeh, A. Aboulnaga, W. Mustafa, and M. Hefeeda. sPCA: Scalable Principal Component Analysis fo Big Data on Distributed Platforms. In Proc. of ACM SIGMOD’15, Melbourne, Australia, May 2015. [pdf] [bibtex]

  • T. Elgamal and M. Hefeeda. Analysis of PCA Algorithms in Distributed Environments. Technical Report arXiv:1503.05214. [pdf][bibtex]

License

sPCA is released under the terms of the MIT License.

Contact

For any issues or enhancement please use the issue pages in Github, or contact us. We will try our best to help you sort it out.

About

Scalable PCA (sPCA) is a scalable implementation of Principal component analysis algorithm on top of Spark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 98.2%
  • Shell 1.8%