MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.
See the project webpage MADlib Home
for links to the
latest binary and source packages. For installation and contribution guides,
please see MADlib Wiki
The latest documentation of MADlib modules can be found at MADlib Docs
or can be accessed directly from the MADlib
installation directory by opening
doc/user/html/index.html
.
The following block-diagram gives a high-level overview of MADlib's architecture.
MADlib incorporates material from the following third-party components
argparse 1.2.1
"provides an easy, declarative interface for creating command line tools"Boost 1.46.1 (or newer)
"provides peer-reviewed portable C++ source libraries"CERN ROOT
"is an object oriented framework for large scale data analysis"doxypy 0.4.2
"is an input filter for Doxygen"Eigen 3.0.3
"is a C++ template library for linear algebra"PyYAML 3.10
"is a YAML parser and emitter for Python"
License information regarding MADlib and included third-party libraries can be
found inside the license
directory.
Changes between MADlib versions are described in the
ReleaseNotes.txt
file.