Titanic Data Analysis and Predictive Modeling

=====================================================

Overview

This repository contains a project focused on performing data cleaning and exploratory data analysis (EDA) on the Titanic dataset. The primary goals of this analysis are to explore the relationships between various variables, identify patterns and trends, and build a predictive model to classify passenger survival using a RandomForest classifier.

Objective

Analyze the Titanic passenger data to uncover insights such as survival rates and the impact of various factors like gender, age, and class.

Getting Started

To get started with this project, follow these steps:

Step 1: Download the CSV files

Click on the "train.csv" and "test.csv" files in the repository and download them to your local machine.

Step 2: Install required libraries

Make sure you have the following libraries installed:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn

You can install them using pip:

pip install pandas numpy matplotlib seaborn scikit-learn

Step 3: Run the Jupyter Notebook

Open the Python file Project_1_Exploratory_Data_Analysis_(EDA)_on_Titanic_Dataset.ipynb in Google Colab or Jupyter Notebook.
Follow the instructions in the notebook to perform the data cleaning, EDA, and predictive modeling.

Project Structure

The repository is organized as follows:

Project_1_Exploratory_Data_Analysis_(EDA)_on_Titanic_Dataset.ipynb: Jupyter Notebook containing the EDA and modeling code.
gender_submission_csv.csv: CSV file used for submission format.
train_csv.csv: Training data CSV file.
test_csv.csv: Test data CSV file.

Key Features

Data Cleaning: Handling missing values, encoding categorical variables, and feature scaling.
Exploratory Data Analysis (EDA): Visualizing data distributions, identifying correlations, and deriving insights.
Predictive Modeling: Building and evaluating a RandomForest classifier to predict passenger survival.

Results

The notebook will generate visualizations and models that help in understanding the impact of various features on survival rates. It will also include the performance metrics of the RandomForest classifier used for predictions.

Contributing

Feel free to fork this repository, create a branch, and submit a pull request with your contributions. Contributions are welcome!

License

This project is licensed under the MIT License.

Acknowledgments

Thanks to Kaggle for providing the Titanic dataset.
Inspiration and tutorials from various data science and machine learning resources.

Contact

If you have any questions or suggestions, please contact me at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Project_1_Exploratory_Data_Analysis_(EDA)_on_Titanic_Dataset.ipynb		Project_1_Exploratory_Data_Analysis_(EDA)_on_Titanic_Dataset.ipynb
README.md		README.md
gender_submission.csv		gender_submission.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic Data Analysis and Predictive Modeling

Overview

Objective

Getting Started

Step 1: Download the CSV files

Step 2: Install required libraries

Step 3: Run the Jupyter Notebook

Project Structure

Key Features

Results

Contributing

License

Acknowledgments

Contact

About

Releases

Packages

Languages

MaryamAshraff2/EDA-on-Titanic-Dataset-

Folders and files

Latest commit

History

Repository files navigation

Titanic Data Analysis and Predictive Modeling

Overview

Objective

Getting Started

Step 1: Download the CSV files

Step 2: Install required libraries

Step 3: Run the Jupyter Notebook

Project Structure

Key Features

Results

Contributing

License

Acknowledgments

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages