Machine Learning Engineer Nanodegree Udacity - Capstone Project

VSB Power Line Fault Detection - Kaggle

Profile: https://www.kaggle.com/vicrsp

Source code

All the python scripts are located at the Source folder. Jupyter notebooks are in the Notebooks folder.

Scripts

main.py: this script will read the train.parquet file and apply all the signal processing steps. It should be the first file to be executed and will generate the all_features.csv file in the Outputs folder.
feature_extraction.py: this file contains the feature extraction routines
signal_denoising.py: this file contains the PD denoising routines
dashboard.py: contains a simple application written in Dash to visualize the relationships in all_features.csv file.

Notebooks

data_exploration_notebook: this notebook contains was used to generate the visualizations in the report
data_processing_and_modelling: this notebook loads the all_features.csv file, applies the pre processing steps described in the report and execute the machine learning algorithms

Summarizing:

Execute main.py to process the .parque data
Execute the data_processing_and_modelling notebook to pre-process the data and learn the models.

Data

The data can be downloaded at:

https://www.kaggle.com/c/vsb-power-line-fault-detection/data

Only the metadata_train.csv and train.parquet files are required for this project. They must be placed in the Input folder. Output data generated by the scripts is located in the Output folder.

Environment Configuration

Python version used for the project development: 3.6.6
The pyarrow package is necessary to read the .parquet files.
The pywt package is necessary to perform the DWT denoising procedures.
Other required packages: scipy, seaborn, matplotlib, pandas, numpy, statsmodels, scikit-learn
Dash and plotly are only required to run the dashboard.py file

Report

All the report files are located in the Report folder. Direct links:

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
Input		Input
Notebooks		Notebooks
Orange Workflows		Orange Workflows
Output		Output
References		References
Report		Report
Source		Source
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Engineer Nanodegree Udacity - Capstone Project

VSB Power Line Fault Detection - Kaggle

Source code

Scripts

Notebooks

Data

Environment Configuration

Report

About

Releases

Packages

Languages

vicrsp/mlen-capstone-udacity

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Engineer Nanodegree Udacity - Capstone Project

VSB Power Line Fault Detection - Kaggle

Source code

Scripts

Notebooks

Data

Environment Configuration

Report

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages