Natural Language Processing with Disaster Tweets

Introduction

This project features an implementation of an NLP pipeline for the disaster tweets Kaggle competition using the Brane framework. The implementation is divided into the following Brane packages which can be imported individually and used in other workflows: compute and visualization.

compute exposes utilities for preprocessing data, training a classifier, and generating a valid submission file for the challenge.
visualization provides functions to generate plots and charts based on the dataset.

We also include a github.yml specification which defines an OpenAPI container that exposes a function to download arbitrary files from GitHub repositories.

Build

Each package can be individually imported with the following command:

brane package import -c epi-project/brane-disaster-tweets-example packages/<PACKAGE_NAME>/container.yml

However, we also provide a shell script for convenience. The user can clone the repository and simply run ./build-package.sh to build all of our packages. Additionally, you also can run the following commands to build a specific package.

# build the computation package
./build-package.sh compute
# build the visualization package
./build-package.sh visualization

Of course, you can always navigate to the package directory and run the following command to build the brane package.

brane package build container.yml

Data

Besides packages, we also need to build the datasets used by the workflow. This can be done using the included ./build-data.sh script to build the training and testing dataset.

# For the training dataset
brane data build ./data/train/data.yml
# For the testing dataset
brane data build ./data/test/data.yml

Run

Our pipeline implementation can be executed locally by simply running the following command in the root folder of the project:

brane workflow run pipeline.bs

The following picture shows an example that our package uses the pipeline.bs to run the whole pipeline in the Kubernetes cluster.

Attribution

This repository is the up-to-date version of the work of Andrea Marino and Jingye Wang, with the aim to implement exactly the same as they have done for a newer version of the framework. Their original repository can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.github/workflows		.github/workflows
data		data
packages		packages
utils		utils
.gitignore		.gitignore
README.md		README.md
WX20220603-195559.png		WX20220603-195559.png
pipeline.bs		pipeline.bs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Processing with Disaster Tweets

Introduction

Build

Data

Run

Attribution

About

Releases

Packages

Languages

epi-project/brane-disaster-tweets-example

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing with Disaster Tweets

Introduction

Build

Data

Run

Attribution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages