We are making a Machine Learning repository where we will upload several datasets and its solution with explanation. Starting from the basic and moving up in difficulty level.
Focusing on both the classification and regression, we have selected following dataset's on which we would be working on.
1) Iris ✔️
2) Titanic ✔️
3) Education dataset ✔️
4) MNIST ✔️
5) Hand SIGNS ✔️
1) Boston housing ✔️
2) Red Wine ✔️
3) Medical cost personal dataset ✔️
4) Car price prediction ✔️
5) Human Resource Data Set
6) New York stock exchange data
7) Deep fake detection
Classification
First, if you have a classification problem “which is predicting the class of a given input”.
Slow but accurate
1)Non-linear SVM
2)Random Forest
3)Neural Network (needs a lot of data points)
4)Gradient Boosting Tree (similar to Random Forest, but easier to overfit)
Fast
1)Explainable models: Decision Tree and Logistic Regression
2)Non-explainable Models: Linear SVM and Naive Bayes
Regression
If you have a regression problem “which is predicting a continuous value like predicting prices of a house given the features of the house like size, number of rooms, etc”.
Slow but accurate
1)Random Forest
2)Neural Network (needs a lot of data points)
3)Gradient Boosting Tree (similar to Random Forest, but easier to overfit)
Fast
1)Decision Tree
2)Linear Regression
You may add your datasets with solutions, or can request us to give their solutions. Happy Coding!!