A Python wrapper for MADlib - an open source library for scalable in-database machine learning algorithms
PyMADlib currently has wrappers for the following algorithms in MADlib
- Linear regression
- Logistic Regression
- SVM (regression & classification)
- K-Means
- LDA
Refer MADlib User Docs for MADlib's user documentation.
- You'll need the python extension psycopg2 to use PyMADlib.
- If you have matplotlib installed, you'll see Matplotlib visualizations for Linear Regression demo.
- If you have installed networkx, you'll see a visualization of the k-means demo
- PyROC is included in the source of this distribution with permission from its developer. You'll see a visualization of the ROC curves for Logistic Regression.
To configure your DB Connection parameters You should create a file in your home directory
~/.pymadlib.config
that should look like so :
[db_connection] user = gpadmin password = XXXXX hostname = 127.0.0.1 (or the IP of your DB server) port = 5432 (the port# of your DB) database = vatsandb (the database you wish to connect to)
- You may install pymadlib by downloading the source (from PyPI) and then run the following
sudo python setup.py build sudo python setup.py install
- If you use easy_install or pip, simply run :
sudo easy_install pymadlib
Visit PyMADlib Tutorial for a tutorial on using PyMADlib Also visit PyMADlib IPython NB to download the IPython NB tutorial
You may run the demo from the extracted directory of pymadlib like so :
python example.py
If you installed PyMADlib using instructions in the previous section, then simply run
python -c 'from pymadlib.example import runDemos; runDemos()'
Remember to close the Matplotlib windows that pop-up to continue with the rest of the demo.
PyMADlib packages publicly available datasets from the UCI machine learning repository and other sources.
- Wine quality dataset from UCI Machine Learning repository
- Auto MPG dataset from UCI ML repository from UCI Machine Learning repository
- Wine quality dataset from UCI Machine Learning repository
- Obama-Romney second presidential debate (2012) transcripts
Installing pymadlib using distutils should automatically install the dependent library psycopg2, which is required to connect to a PostGres database (where MADlib is installed on). If you are using Mac OSX 10.6.X you may run into issues with installing psycopg2.
psycopg2-and-postgresql-9-1-on-snow-leopard and links-about-building-psycopg-mac-os-x discuss the issue and offer some solutions.