Skip to content

MADlib Installer Notes (2011 Feb)

agorajek edited this page Feb 25, 2011 · 1 revision

Arguably controversial suggestions, but (in my humble opinion) worth a discussion:

1) Provide end users with a MADlib installer package on all supported platforms (rpm on RedHad/CentOS, deb for Debian/Ubuntu, MacPorts on Mac, …​) This installer package has all required dependencies set (LAPACK etc.), so we do not have to worry about them. In short: Installing MADlib at the OS level with all prerequisites should be no more complicated than, say, "yum install madlib".

2) Unless this already exists: GP should provide a script (say, e.g., "gp-yum" on RH/CentOS) that runs the OS package manager on all GP nodes to create identical setups.

3) Ship binary packages where possible (to make installing a matter of seconds). Compiling from source is optional for those users who know what they are doing. In fact, those users can we can expect to look into our (less polished) developer documentation to learn which library dependencies there are and how to build MADlib in general.

4) If necessary: The rpm, deb, …​ packages install MADlib at a location independent of the database (e.g., /usr/local/share/madlib) so that it will not interfere with other OS-level packages.

5) Only expose the option to install MADlib in whole.

6) After installing MADlib at the OS level, the user only has two commands available for managing the db level: "madpack install <dbname> [<schemaname>]" and "madpack uninstall <dbname> [<schemaname>]" where: * <schemaname> defaults to "madlib" * "install" also upgrades * There are connection options ("-h", "-U", …​ just as for psql)

7) If necessary, the install command will inform the user that the PostgreSQL/GP configuration has to be changed to include the paths to MADlib. If the user agrees, madpack will then automatically modify the configuration and restart the db server. - Optional: Allow the user to install MADlib into template1, so that any newly created database will automatically contain MADlib. - madpack -v should report the version of MADlib installed at OS level. A SQL function madlib.version() should report the version of MADlib installed in a db.

8) Politics once MADlib has left its infancy: - GP Community Edition should always come with MADlib preinstalled and preconfigured. (Perhaps, GP should also come as a proper rpm/deb/…​ installer package on its supported platforms?) - Commercial GP should ship with the latest stable release of MADlib. Adventurous users could still upgrade as suggested above. Say, with "gp-yum update madlib". (Using the OS-level package manager, users could easily downgrade again.) - Another option is to have commercial GP include MADlib in an opaque way (no OS level package). Benefit: MADlib appears even more as an integral part of GP. Disadvantage: Less flexibility for users.