- Created a model based on Machine Learning to predict into survival or deceased.
Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn
For the first step I checked the data set about the missing values and the result was the missing values heatmap, that demonstrates around 20% of missing data in the age column and many missing values in the cabin column. To dicrease the impact of the age column missing values, I calculated the average age for each class of the titanic, as presented in the age per class boxplot figure. And for last, after some featuring engineering, I filled the missing values with these average ages and the result is present in the last filled missing values heatmap figure.
The cabin column was dropped, because it were many missing values to try some similary approach as I did with the age column.
First, I splitted the data into train and test sets with a test size of 30%.
After training the data, I tested the model to predict the results on the splitted test data.
The results for the overall model performance are great.
The precision is over 80% for both features (Survived / not survived). The recall is a little bit lower then the others key classification metrics, but even that it is a reseanable result. The F1-score and Accuracy are great too.
As demonstrated, the model presented a good performance to predict the survival or deceased, but it demands more improvement to do.