Skip to content

pvsnp9/Feature-Enginering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Feature-Engineering

Machine Learning Feature Engineering Techniques.

BIG PICTURE

A general machine learning project follows the following steps.

ml Flow

Data Analysis:

data analysis

data analysis

  • Feature Creation:

    • Extracting features from dates
    • Extracting features from Mixed variables
    • Missing data imputation
    • Categorical variable imputation
    • Numerical variable transformation
    • Discretization
    • Outlier Handling
    • Feature Scaling

Feature Engineering refers to:

  • Missing data imputation
  • Categorical variable encoding
  • Numerical variable transformation
  • Discretization
  • Engineering of datetime variables
  • Engineering of coordinates — GIS data
  • Feature extraction from text
  • Feature extraction from images
  • Feature extraction from time series
  • New feature creation by combining existing variables

Following Content have been used in this repo:

  • Missing data imputation:

    • mean
    • median
    • mode
    • arbitrary
    • end of tail and random sample imputation
    • multivariate imputation.
  • Categorical variable encoding:

    • one-hot
    • ordinal
    • mean encoding
    • weight-of evidence
    • binarization,
    • feature hashing.
  • Numerical variable transformation:

    • logarithmic
    • reciprocal
    • exponential
    • Box-Cox
    • Yeo-Johnson transformations.
  • Variable discretization:

    • equal width discretization
    • equal-frequency discretization
    • k-means discretization
    • decision trees discretization
  • Outlier removal:

    • trimming
    • capping
    • Winsorization
  • Feature Scaling:

    • standardization
    • MinMax scaling
    • robust scaling
    • norm scaling
  • Engineering of datetime variables:

    • extracting features from day, month and year parts, and capturing elapsed time including in different time zones.
  • Engineering of mixed numerical and categorical variables

  • Compared code implementation with different available open source Python packages, like Scikit-learn, and Category encoders.

About

Machine Learning Feature Engineering Techniques.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published