=====================================================
This repository contains a project focused on performing data cleaning and exploratory data analysis (EDA) on the Titanic dataset. The primary goals of this analysis are to explore the relationships between various variables, identify patterns and trends, and build a predictive model to classify passenger survival using a RandomForest classifier.
Analyze the Titanic passenger data to uncover insights such as survival rates and the impact of various factors like gender, age, and class.
To get started with this project, follow these steps:
- Click on the "train.csv" and "test.csv" files in the repository and download them to your local machine.
-
Make sure you have the following libraries installed:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
-
You can install them using pip:
pip install pandas numpy matplotlib seaborn scikit-learn
- Open the Python file
Project_1_Exploratory_Data_Analysis_(EDA)_on_Titanic_Dataset.ipynb
in Google Colab or Jupyter Notebook. - Follow the instructions in the notebook to perform the data cleaning, EDA, and predictive modeling.
The repository is organized as follows:
Project_1_Exploratory_Data_Analysis_(EDA)_on_Titanic_Dataset.ipynb
: Jupyter Notebook containing the EDA and modeling code.gender_submission_csv.csv
: CSV file used for submission format.train_csv.csv
: Training data CSV file.test_csv.csv
: Test data CSV file.
- Data Cleaning: Handling missing values, encoding categorical variables, and feature scaling.
- Exploratory Data Analysis (EDA): Visualizing data distributions, identifying correlations, and deriving insights.
- Predictive Modeling: Building and evaluating a RandomForest classifier to predict passenger survival.
The notebook will generate visualizations and models that help in understanding the impact of various features on survival rates. It will also include the performance metrics of the RandomForest classifier used for predictions.
Feel free to fork this repository, create a branch, and submit a pull request with your contributions. Contributions are welcome!
This project is licensed under the MIT License.
- Thanks to Kaggle for providing the Titanic dataset.
- Inspiration and tutorials from various data science and machine learning resources.
If you have any questions or suggestions, please contact me at [email protected]