Disaster Response Pipeline Project

1. Introduction

This Project is part of the Data Science Nanodegree Program given by Udacity in collaboration with Figure Eight.

The initial dataset contains pre-labelled messages that were sent during disaster events.

The aim of the project is to build a Natural Language Processing tool to categorize messages.

2. Project components

There are three main components.

2.1 ETL Pipeline

File process_data.py contains a ETL (Extract, Transform and Load Pipeline) that:

Loads the messages and categories datasets
Merges the two datasets
Cleans the data
Stores it in a SQLite database

2.2 ML Pipeline

File train_classifier.py contains a Machine Learning Pipeline that:

Loads data from the SQLite database
Splits the dataset into training and test sets
Builds a text processing and machine learning pipeline
Trains and tunes a model using GridSearchCV
Outputs results on the test set
Exports the final model as a pickle file

2.3 Flask Web App

File run.py builds a webapp where users can input a message and it will be classified with the previously trained model.

3. Program

3.1 Dependencies

Python 3.5+
Machine Learning Libraries: NumPy, Pandas, Scikit-Learn
Natural Language Process Libraries: NLTK
SQLlite Database Libraries: SQLalchemy
Web App and Data Visualization: Flask, Plotly

3.2 Running:

Run the following commands in the project's root directory to set up your database and model. Note that steps 1 and 2 (ETL and ML) are optional because the model already trained is in the repository.

To run the ETL pipeline that cleans data and stores in database: python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
To run the ML pipeline that trains and saves the classifier: python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
To run the webapp (run it from the app's directory): python run.py
Go to http://127.0.0.1:4000/

3.3 Additional Material

In the Notebooks directory, you may find ETL and ML Pipelines in a Jupyter Notebook that may help you understand how the model work step by step.

Please note that ML Pipeline contained in the notebook (.ipynb) has parameter tunning whereas the one in the script (.py) format has the set of parameters fixed.

You may find that the parameters of the script were not the optimal found in the notebook due to the large size of the pickle file (>100MB). By simplifying the parameters, pickle size file was considerably reduced.

4. Acknowledgements

Udacity for providing such a complete Data Science Nanodegree Program
Figure Eight for providing the dataset.

5. Screenshots

Home page - input section
Home page - dataset overview section
Output - message classified

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Notebooks		Notebooks
app		app
data		data
img		img
models		models
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Response Pipeline Project

Table of Contents

1. Introduction

2. Project components

2.1 ETL Pipeline

2.2 ML Pipeline

2.3 Flask Web App

3. Program

3.1 Dependencies

3.2 Running:

3.3 Additional Material

4. Acknowledgements

5. Screenshots

About

Releases

Packages

Languages

sjuanandres0/disaster_response_pipeline_project

Folders and files

Latest commit

History

Repository files navigation

Disaster Response Pipeline Project

Table of Contents

1. Introduction

2. Project components

2.1 ETL Pipeline

2.2 ML Pipeline

2.3 Flask Web App

3. Program

3.1 Dependencies

3.2 Running:

3.3 Additional Material

4. Acknowledgements

5. Screenshots

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages