Skip to content

Machine Learning Engineer Nanodegree Udacity - Capstone Project

Notifications You must be signed in to change notification settings

vicrsp/mlen-capstone-udacity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Engineer Nanodegree Udacity - Capstone Project

VSB Power Line Fault Detection - Kaggle

Profile: https://www.kaggle.com/vicrsp

Source code

All the python scripts are located at the Source folder. Jupyter notebooks are in the Notebooks folder.

Scripts

  • main.py: this script will read the train.parquet file and apply all the signal processing steps. It should be the first file to be executed and will generate the all_features.csv file in the Outputs folder.
  • feature_extraction.py: this file contains the feature extraction routines
  • signal_denoising.py: this file contains the PD denoising routines
  • dashboard.py: contains a simple application written in Dash to visualize the relationships in all_features.csv file.

Notebooks

  • data_exploration_notebook: this notebook contains was used to generate the visualizations in the report
  • data_processing_and_modelling: this notebook loads the all_features.csv file, applies the pre processing steps described in the report and execute the machine learning algorithms

Summarizing:

  1. Execute main.py to process the .parque data
  2. Execute the data_processing_and_modelling notebook to pre-process the data and learn the models.

Data

The data can be downloaded at:

https://www.kaggle.com/c/vsb-power-line-fault-detection/data

Only the metadata_train.csv and train.parquet files are required for this project. They must be placed in the Input folder. Output data generated by the scripts is located in the Output folder.

Environment Configuration

  • Python version used for the project development: 3.6.6
  • The pyarrow package is necessary to read the .parquet files.
  • The pywt package is necessary to perform the DWT denoising procedures.
  • Other required packages: scipy, seaborn, matplotlib, pandas, numpy, statsmodels, scikit-learn
  • Dash and plotly are only required to run the dashboard.py file

Report

All the report files are located in the Report folder. Direct links:

About

Machine Learning Engineer Nanodegree Udacity - Capstone Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published