Matrix Methods in Hadoop

David F. Gleich, Computer Science, Purdue University

These codes accompany my presentation on Matrix Methods in Hadoop at the BIGDATA Techcon in Boston, MA in April 2013. I suspect they'll be used in other presentations as well.

Overview

The goal in these slides is to demonstrate how to implement simple matrix computations in Hadoop using Yelp's mrjob system.

Sparse matrix-vector products
Matrix-matrix products
A recommender system for epinions data

Getting started

Get mrjob working. Nothing here will require an actual MapReduce cluster, but feel free to use one if you wish! I setup a virtualenv for this and use pip.
```
 mkdir envs
 virtualenv envs/mrjob
 source envs/mrjob/bin/activate
 pip install mrjob
```
Get the datasets for the recommender system
```
 make getdata
```

Run some examples

Sparse matrix-vector products

 python codes/smatvec.py samples/smat_10_5_A.txt samples/vec_5.txt 
 
 # Compare the output to a non-MR computation
 python codes/test_smatvec.py samples/smat_10_5_A.txt samples/vec_5.txt

Sparse matrix-matrix products

 python codes/matmat.py samples/smat_10_5_A.txt samples/smat_5_5.txt 
 
 # Compare the output to a non-MR computation
 python codes/test_smatmat.py samples/smat_10_5_A.txt samples/smat_5_5.txt

Run the recommender system

Warning, this actually takes a while. I'm not sure where the bottle-neck is, but

python recsys/recsys.py data/rating.txt.gz data/user_ratings.txt.gz

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
codes		codes
recsys		recsys
samples		samples
.gitignore		.gitignore
README.md		README.md
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Matrix Methods in Hadoop

David F. Gleich, Computer Science, Purdue University

Overview

Getting started

Run some examples

Run the recommender system

About

Releases

Packages

Languages

dgleich/matrix-hadoop-tutorial

Folders and files

Latest commit

History

Repository files navigation

Matrix Methods in Hadoop

David F. Gleich, Computer Science, Purdue University

Overview

Getting started

Run some examples

Run the recommender system

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages