Updated to work with the newer dataset. The input format for this used to be a tab-seperated text file, the dataset is now distributed in a different format (CSV for the review subsets link)
This fork of X-MAP aims to bring compatibility with the newer datasets. Current update only works with TSV, scripts are provided to parse the newer dataset into a friendly format
I recommend looking into this docker spark cluster for running this -- there will be some modifications needed to the dockerfiles provided in that repo, though.
X-MAP is a large-scale heterogeneous recommender which is built on top of Apache Spark and implemented in Python.
- Provides heterogeneous recommendation based on artificial AlterEgos of users across multiple application domains.
- Any classical homogeneous recommendation algorithm can be run in the target application domain using these AlterEgos.
- Provides formal privacy guarantees.
X-MAP requires Python 3
, Numpy 1.10.4
, Apache Spark 1.6.1
pre-installed on your machine.
Please refer to Anaconda, Apache Spark for installation instructions.
We also provide a docker image for your convenience.
If you do not have docker
and docker-compose
installed on your machine, please check the official configuration guidance for installation guidelines.
Next, open the console, go to the platform
folder, and execute the below-mentioned command to setup the corresponding docker image.
docker-compose build
Once you have modified the scripts in X-MAP
folder, you should rebuild the package using the following command:
python setup.py install
We provide an egg file, located in dist/xmap-0.1.0-py3.5.egg
, that you could use for your application.
X-MAP is tested on real-traces from Amazon. For current implementation, the input data follows the below-mentioned format:
<userid>\t<itemid>\t<rating>\t<timestamp>
.
Note that the timestamp is required if you want to implement algorithms incorporating temporal behaviour of users which is also supported by AlterEgos.
We provide here two demonstrations: twodomain_demo.py
and multidomain_demo.py
. You can also tune the parameters in the file parameters.yaml
.
Note that the scipt should run successfully using the docker image that we provided. Please check your local system settings (e.g., directory path) while working with the application.
A simple example of how to run X-MAP on a local machine.
spark-submit --master local[4] \
--py-files dist/xmap-0.1.0-py3.5.egg twodomain_demo.py
A simple example of how to run X-MAP on a cluster of machines.
spark-submit --py-files xmap-0.1.0-py3.5.egg \
--num-executors 30 --executor-cores 3 --executor-memory 12g \
--driver-memory 12g --driver-cores 4 twodomain_demo.py
X-MAP can be easily used with any publicly available recommender library. We provide an example below for using Spark's built-in MLlib library with X-MAP.
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from xmap.core import *
# use component in xmap to build alterEgo profile.
sourceRDD = baseliner_clean_data_pipeline(...)
targetRDD = baseliner_clean_data_pipeline(...)
trainRDD, testRDD = baseliner_split_data_pipeline(...)
item2item_simRDD = baseliner_calculate_sim_pipeline(...)
extendedsimRDD = extender_pipeline(...)
alterEgo_profileRDD = generator_pipeline(...)
# build MLlib to do matrix factorization
als = ALS(...)
model = als.fit(alterEgo_profileRDD)
predictions = model.transform(testRDD)
evaluator = RegressionEvaluator(...)
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))
Please raise potential bugs on github. If you have an open-ended or a research related question, you can post it on: X-MAP group.