Insurance Premium Predictor

A machine learning solution for predicting insurance premiums using LightGBM. This project demonstrates end-to-end ML pipeline development with focus on code quality and maintainability.

🎯 Project Overview

This project implements a robust machine learning pipeline for predicting insurance premiums. It features:

Custom feature engineering for insurance data
LightGBM model with optimized parameters
5-fold cross-validation
RMSLE (Root Mean Squared Logarithmic Error) optimization
Type-safe implementation with comprehensive error handling

🛠️ Technical Stack

Python: 3.11+
Core Libraries:
- lightgbm: Gradient boosting framework
- pandas: Data manipulation
- numpy: Numerical operations
- scikit-learn: ML utilities

📁 Project Structure

insurance-premium-predictor/
├── src/
│   ├── features/           # Feature engineering
│   │   └── feature_engineering.py
│   ├── models/            # Model implementations
│   │   └── models.py
│   ├── utils/             # Utility functions
│   │   └── metrics.py
│   └── train.py          # Training pipeline
├── data/                  # Data directory
├── submissions/          # Model predictions
└── requirements.txt      # Project dependencies

🔧 Features

Feature Engineering

Automated categorical variable handling
Domain-specific feature creation:
- Income per dependent
- Claims per year
- Policy duration analysis

Model Implementation

LightGBM with early stopping
Optimized hyperparameters
Cross-validation for robust evaluation

Quality Assurance

Type hints throughout
Comprehensive error handling
Detailed logging
Modular code structure

📊 Model Performance

The model is evaluated using 5-fold cross-validation with RMSLE as the metric:

Mean CV RMSLE: [1.1425]
Standard Deviation: [+/- 0.0055]

d

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

👤 Author

Your Name

GitHub: @psukh28
LinkedIn: surya-praanv-sukumaran

🌟 Acknowledgments

Data source: Playground Series S4-E12
Inspiration: Insurance premium prediction challenge
Libraries: LightGBM, scikit-learn, pandas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Insurance Premium Predictor

🎯 Project Overview

🛠️ Technical Stack

📁 Project Structure

🔧 Features

Feature Engineering

Model Implementation

Quality Assurance

📊 Model Performance

📝 License

👤 Author

🌟 Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
features		features
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
train.py		train.py

License

psukh28/lightgbm-insurance-predictor

Folders and files

Latest commit

History

Repository files navigation

Insurance Premium Predictor

🎯 Project Overview

🛠️ Technical Stack

📁 Project Structure

🔧 Features

Feature Engineering

Model Implementation

Quality Assurance

📊 Model Performance

📝 License

👤 Author

🌟 Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages