-
Notifications
You must be signed in to change notification settings - Fork 0
Home
scikit-learn provides a library of transformers, which may clean (see Preprocessing data), reduce (see Unsupervised dimensionality reduction), expand (see Kernel Approximation) or generate (see Feature extraction) feature representations.
Like other estimators, these are represented by classes with a fit method, which learns model parameters (e.g. mean and standard deviation for normalization) from a training set, and a transform method which applies this transformation model to unseen data. fit_transform may be more convenient and efficient for modelling and transforming the training data simultaneously.
Combining such transformers, either in parallel or series is covered in Pipelines and composite estimators. Pairwise metrics, Affinities and Kernels covers transforming feature spaces into affinity matrices, while Transforming the prediction target (y) considers transformations of the target space (e.g. categorical labels) for use in scikit-learn.
6.1.1. Pipeline: chaining estimators 6.1.2. Transforming target in regression 6.1.3. FeatureUnion: composite feature spaces 6.1.4. ColumnTransformer for heterogeneous data 6.1.5. Visualizing Composite Estimators
6.2.1. Loading features from dicts 6.2.2. Feature hashing 6.2.3. Text feature extraction 6.2.4. Image feature extraction
6.3.1. Standardization, or mean removal and variance scaling -> sergiomora03
6.3.2. Non-linear transformation -> abdala9512
6.3.3. Normalization
6.3.4. Encoding categorical features
6.3.5. Discretization
6.3.6. Imputation of missing values
6.3.7. Generating polynomial features
6.3.8. Custom transformers
6.4.1. Univariate vs. Multivariate Imputation 6.4.2. Univariate feature imputation 6.4.3. Multivariate feature imputation 6.4.4. References 6.4.5. Nearest neighbors imputation 6.4.6. Marking imputed values
6.5.1. PCA: principal component analysis 6.5.2. Random projections 6.5.3. Feature agglomeration
6.6.1. The Johnson-Lindenstrauss lemma 6.6.2. Gaussian random projection 6.6.3. Sparse random projection
6.7.1. Nystroem Method for Kernel Approximation 6.7.2. Radial Basis Function Kernel 6.7.3. Additive Chi Squared Kernel 6.7.4. Skewed Chi Squared Kernel 6.7.5. Mathematical Details
6.8.1. Cosine similarity 6.8.2. Linear kernel 6.8.3. Polynomial kernel 6.8.4. Sigmoid kernel 6.8.5. RBF kernel 6.8.6. Laplacian kernel 6.8.7. Chi-squared kernel
6.9.1. Label binarization 6.9.2. Label encoding
Source: https://scikit-learn.org/stable/data_transforms.html
-
6.3.1. Standardization, or mean removal and variance scaling
-
6.3.2. Non-linear transformation
-
6.3.3. Normalization
-
6.3.4. Encoding categorical features
-
6.3.5. Discretization
-
6.3.6. Imputation of missing values
-
6.3.7. Generating polynomial features
-
6.3.8. Custom transformers