Skip to content

End-to-end ML solution for predicting insurance premiums with focus on code quality and maintainability

License

Notifications You must be signed in to change notification settings

psukh28/lightgbm-insurance-predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Insurance Premium Predictor

A machine learning solution for predicting insurance premiums using LightGBM. This project demonstrates end-to-end ML pipeline development with focus on code quality and maintainability.

🎯 Project Overview

This project implements a robust machine learning pipeline for predicting insurance premiums. It features:

  • Custom feature engineering for insurance data
  • LightGBM model with optimized parameters
  • 5-fold cross-validation
  • RMSLE (Root Mean Squared Logarithmic Error) optimization
  • Type-safe implementation with comprehensive error handling

🛠️ Technical Stack

  • Python: 3.11+
  • Core Libraries:
    • lightgbm: Gradient boosting framework
    • pandas: Data manipulation
    • numpy: Numerical operations
    • scikit-learn: ML utilities

📁 Project Structure

insurance-premium-predictor/
├── src/
│   ├── features/           # Feature engineering
│   │   └── feature_engineering.py
│   ├── models/            # Model implementations
│   │   └── models.py
│   ├── utils/             # Utility functions
│   │   └── metrics.py
│   └── train.py          # Training pipeline
├── data/                  # Data directory
├── submissions/          # Model predictions
└── requirements.txt      # Project dependencies

🔧 Features

Feature Engineering

  • Automated categorical variable handling
  • Domain-specific feature creation:
    • Income per dependent
    • Claims per year
    • Policy duration analysis

Model Implementation

  • LightGBM with early stopping
  • Optimized hyperparameters
  • Cross-validation for robust evaluation

Quality Assurance

  • Type hints throughout
  • Comprehensive error handling
  • Detailed logging
  • Modular code structure

📊 Model Performance

The model is evaluated using 5-fold cross-validation with RMSLE as the metric:

  • Mean CV RMSLE: [1.1425]
  • Standard Deviation: [+/- 0.0055]

d

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

👤 Author

Your Name

🌟 Acknowledgments

  • Data source: Playground Series S4-E12
  • Inspiration: Insurance premium prediction challenge
  • Libraries: LightGBM, scikit-learn, pandas

About

End-to-end ML solution for predicting insurance premiums with focus on code quality and maintainability

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages