Using machine learning to develop models that is capable of predicting customer subscription based on current dataset
The dataset is composed of 21 variables: [20 independent (X) & 1 dependent(y)]
- The code analyses the spread of the current dataset
- Calculates the correlation between the input datasets and the output dataset
- Identifies 5 most correlated x(i) variable to y
- Model 1 - Linear regression model using 20 inputs
- Model 2 - Linear regression model using 5 most correlated input to 'y'
- Splits the data set to Training and Test Data set with a ratio of 9:1
- Evaluates the accuracy of the machine models
- The program is scripted in Python3. Have the correct version of python installed
- Please ensure you have following libraries installed:
- pandas
- numpy
- matplotlib
- scipy
- sklearn
- Please have the following files in the same directory as main.py
- data set.csv
- Load Terminal
- Set current directory to the folder containing the main.py file.
- Run script from a Terminal. Type python3 main.py
- The plots generated by the code will be saved in the same directory as main.py file.
- Jerin Philips Rajan