This repository contains the code of the paper "CheckMate: Evaluating Checkpointing Protocols for Streaming Dataflows", ICDE 2024.
This project requires an environment with python 3.11 installed. Please install the universalis-package and all the requirements of the coordinator and the worker modules as well as pandas, numpy and matplotlib.
You can use the following commands:
pip install universalis-package/.
pip install -r coordinator/requirements.txt
pip install -r worker/requirements.txt
pip install pandas numpy matplotlib
In the scripts directory, we provide a number of different scripts that can be used to run the experiments of CheckMate. The easiest way is to create a csv file formatted as follows:
# experiment_name,query,protocol,checkpoint_interval,num_of_workers,input_rate,failure,hot_item_ratio
example-q1-unc,q1,UNC,5,4,4000,true,0.0
example-q1-cor,q1,COR,5,4,4000,true,0.0
The csv file should not include the header and a newline is required after the last line of configuration.
Each parameter can take the following values:
Parameter | Values |
---|---|
experiment_name | Any name allowed by your OS. It will be used to create a folder where all the results of the experiment will be stored, as well as prefixing the created files. |
query | q1, q3, q8-running, q12-running, cyclic |
protocol | NOC, UNC, COR, CIC |
checkpoint interval | Any value > 0 |
num_of_workers | Any integer > 0. Every worker requires 2 cpus. |
input_rate* | Any integer > 0. |
failure | true / false |
hot_item_ratio | 0 (Applicable only in NexMark queries.) |
* In the case of the cyclic query, the generator uses 3 threads, so the value should be the 1/3 of the desired total input rate.
We provide a csv file containing a sample of exemplary configurations. A csv file containing all the used configurations in our experiments will follow.
Using either the provided or your own csv files, you can run the experiments using the following script from the root
of the repository:
./scripts/run_batch_experiments.sh location_of_the_csv_file directory_to_save_results
Note: To run experiments with NexMark queries, you must first build the generator.
To build the generator, run mvn clean package
from nexmark directory. Java 11 and maven are required.
Alternatively, you can also handle the individual components of the pipeline as follows. First, you need to deploy the Kafka cluster and the MinIO storage.
To run kafka: docker compose -f docker-compose-kafka.yml up
To clear kafka: docker compose -f docker-compose-kafka.yml down --volumes
To run MinIO: docker compose -f docker-compose-simple-minio.yml up
To clear MinIO: docker compose -f docker-compose-simple-minio.yml down --volumes
Then, you can start the stream processing engine and specify the desired scale.
To run the SPE: docker compose up --build --scale worker=4
To clear the SPE: docker compose down --volumes