This project features an implementation of an NLP pipeline for the disaster tweets Kaggle competition using the Brane framework. The implementation is divided into the following Brane packages which can be imported individually and used in other workflows: compute
and visualization
.
compute
exposes utilities for preprocessing data, training a classifier, and generating a valid submission file for the challenge.visualization
provides functions to generate plots and charts based on the dataset.
We also include a github.yml
specification which defines an OpenAPI container that exposes a function to download arbitrary files from GitHub repositories.
Each package can be individually imported with the following command:
brane package import -c epi-project/brane-disaster-tweets-example packages/<PACKAGE_NAME>/container.yml
However, we also provide a shell script for convenience. The user can clone the repository and simply run ./build-package.sh
to build all of our packages. Additionally, you also can run the following commands to build a specific package.
# build the computation package
./build-package.sh compute
# build the visualization package
./build-package.sh visualization
Of course, you can always navigate to the package directory and run the following command to build the brane package.
brane package build container.yml
Besides packages, we also need to build the datasets used by the workflow. This can be done using the included ./build-data.sh
script to build the training and testing dataset.
# For the training dataset
brane data build ./data/train/data.yml
# For the testing dataset
brane data build ./data/test/data.yml
Our pipeline implementation can be executed locally by simply running the following command in the root folder of the project:
brane workflow run pipeline.bs
The following picture shows an example that our package uses the pipeline.bs to run the whole pipeline in the Kubernetes cluster.
This repository is the up-to-date version of the work of Andrea Marino and Jingye Wang, with the aim to implement exactly the same as they have done for a newer version of the framework. Their original repository can be found here.