This is a well-known and basic challenge from Kaggle. Titanic - Machine Learning from Disaster. All contains are Written by myself titanic.py => main function titanic_clean.py => function for cleaning data
survival Survival 0 = No, 1 = Yes
pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
sex Sex
Age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number
embarked
The Feature Engineer I have done here are as follows:
- Create new feature: "family size" from the combination of features "Sibsp" and "parch"
- Remove non number features in "Ticket"
- Extract name First Name from feature "Name"
- Encode features "Sex" and "Name"
- Normalize all features inside the dataset
Randomforest from lib Scikit GridSearchCV from lib Scikit -> to find best estimator
Score of 0.78468 Ranking of 9015