Skip to content

This repository contains a project focused on performing data cleaning and exploratory data analysis (EDA) on the Titanic dataset. The primary goals of this analysis are to explore the relationships between various variables, identify patterns and trends, and build a predictive model to classify passenger survival using a RandomForest classifier.

Notifications You must be signed in to change notification settings

MaryamAshraff2/EDA-on-Titanic-Dataset-

Repository files navigation

Titanic Data Analysis and Predictive Modeling

=====================================================

Overview


This repository contains a project focused on performing data cleaning and exploratory data analysis (EDA) on the Titanic dataset. The primary goals of this analysis are to explore the relationships between various variables, identify patterns and trends, and build a predictive model to classify passenger survival using a RandomForest classifier.

Objective


Analyze the Titanic passenger data to uncover insights such as survival rates and the impact of various factors like gender, age, and class.

Getting Started


To get started with this project, follow these steps:

Step 1: Download the CSV files

  • Click on the "train.csv" and "test.csv" files in the repository and download them to your local machine.

Step 2: Install required libraries

  • Make sure you have the following libraries installed:

    • pandas
    • numpy
    • matplotlib
    • seaborn
    • scikit-learn
  • You can install them using pip:

    pip install pandas numpy matplotlib seaborn scikit-learn

Step 3: Run the Jupyter Notebook

  • Open the Python file Project_1_Exploratory_Data_Analysis_(EDA)_on_Titanic_Dataset.ipynb in Google Colab or Jupyter Notebook.
  • Follow the instructions in the notebook to perform the data cleaning, EDA, and predictive modeling.

Project Structure


The repository is organized as follows:

  • Project_1_Exploratory_Data_Analysis_(EDA)_on_Titanic_Dataset.ipynb: Jupyter Notebook containing the EDA and modeling code.
  • gender_submission_csv.csv: CSV file used for submission format.
  • train_csv.csv: Training data CSV file.
  • test_csv.csv: Test data CSV file.

Key Features


  • Data Cleaning: Handling missing values, encoding categorical variables, and feature scaling.
  • Exploratory Data Analysis (EDA): Visualizing data distributions, identifying correlations, and deriving insights.
  • Predictive Modeling: Building and evaluating a RandomForest classifier to predict passenger survival.

Results


The notebook will generate visualizations and models that help in understanding the impact of various features on survival rates. It will also include the performance metrics of the RandomForest classifier used for predictions.

Contributing


Feel free to fork this repository, create a branch, and submit a pull request with your contributions. Contributions are welcome!

License


This project is licensed under the MIT License.

Acknowledgments


  • Thanks to Kaggle for providing the Titanic dataset.
  • Inspiration and tutorials from various data science and machine learning resources.

Contact


If you have any questions or suggestions, please contact me at [email protected]

About

This repository contains a project focused on performing data cleaning and exploratory data analysis (EDA) on the Titanic dataset. The primary goals of this analysis are to explore the relationships between various variables, identify patterns and trends, and build a predictive model to classify passenger survival using a RandomForest classifier.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published