Profile: https://www.kaggle.com/vicrsp
All the python scripts are located at the Source
folder. Jupyter notebooks are in the Notebooks
folder.
main.py
: this script will read thetrain.parquet
file and apply all the signal processing steps. It should be the first file to be executed and will generate theall_features.csv
file in theOutputs
folder.feature_extraction.py
: this file contains the feature extraction routinessignal_denoising.py
: this file contains the PD denoising routinesdashboard.py
: contains a simple application written in Dash to visualize the relationships inall_features.csv
file.
data_exploration_notebook
: this notebook contains was used to generate the visualizations in the reportdata_processing_and_modelling
: this notebook loads theall_features.csv
file, applies the pre processing steps described in the report and execute the machine learning algorithms
Summarizing:
- Execute
main.py
to process the .parque data - Execute the
data_processing_and_modelling
notebook to pre-process the data and learn the models.
The data can be downloaded at:
https://www.kaggle.com/c/vsb-power-line-fault-detection/data
Only the metadata_train.csv
and train.parquet
files are required for this project. They must be placed in the Input
folder. Output data generated by the scripts is located in the Output
folder.
- Python version used for the project development: 3.6.6
- The pyarrow package is necessary to read the .parquet files.
- The pywt package is necessary to perform the DWT denoising procedures.
- Other required packages: scipy, seaborn, matplotlib, pandas, numpy, statsmodels, scikit-learn
- Dash and plotly are only required to run the
dashboard.py
file
All the report files are located in the Report
folder. Direct links: