Primula

What is Primula

Primula is a serverless shuffle operator for general-purpose serverless frameworks. It is built upon the principles of scalability and transparency and it is designed for shuffle-like operations on routine data analysis pipelines.

Primula provides several features for the automatization of shuffle-like workloads, mainly:

Automatic inference of the optimal number of parallel workers for the shuffle operation.
Load-balancing of workers through data sampling.
Eager mitigation of straggler functions.
Asynchronous MapReduce execution for performance.

Architecture

Primula is based on IBM-PyWren (now LitHop), a serverless framework for massively parallel jobs. It currently supports IBM Cloud Functions as FaaS and IBM Cloud Object Storage (COS) as remote shared storage for data persistency and communication between functions.

Prerequisites

Python > 3.4

Python version must be 3.4 or above.
IBM Cloud account

Detailed information about IBM Cloud account configuration can be found at the IBM-PyWren github repository.
1. Sign up on https://cloud.ibm.com/
2. Copy config.json.template to config.json.
3. Create a new IBM COS bucket and insert its public and private endpoint urls and IBM COS keys into config.json.
4. Set the new bucket's name as pywren's "storage_bucket" in config.json.
5. Create a new IBM Cloud Functions namespace and a CloudFoundry organization and insert your access keys into config.json.
Dataset placement

Datasets to be processed with Primula must be located in a IBM COS bucket.

Setup and configuration

Installation of Primula in a python environment is straightforward.

Clone this repository into your python environment.
Move to our extended IBM-PyWren project's folder
```
cd pywren-ibm-primula
```
Install IBM-PyWren allong with Primula.
```
pip install -e .
```

Usage

Primula can be executed both from command line or from a Jupyter Notebook. We provide an example workflow at examples/primula_example_basic.ipynb.

Authors

Marc Sánchez-Artigas (Universitat Rovira i Virgili) marc.sanchez@urv.cat
Germán T. Eizaguirre (Universitat Rovira i Virgili) germantelmo.eizaguirre@urv.cat

Publications

Primula: a Practical Shuffle/Sort Operator for Serverless Computing - ACM/IFIP Middleware 2020

Acknowledgements

This work has been partially supported by the EUHorizon 2020 programme under grant agreement 825184 andby the Spanish Government (PID2019-106774RB-C22).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Primula

What is Primula

Architecture

Prerequisites

Setup and configuration

Usage

Authors

Publications

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Primula

What is Primula

Architecture

Prerequisites

Setup and configuration

Usage

Authors

Publications

Acknowledgements