-
-
- -
-
-
-
-
-

Assembling

-

fuzzy_join(): Joining -tables on non-normalized categories with approximate matching. Example

-

Joiner, -AggJoiner: transformers for joining multiple tables together. Example

+
+
+

skrub is a Python library easing preprocessing and feature engineering for + tabular machine learning. Our long-term goal is to directly connect + database tables to machine learning estimators. +

-
-
-
-
-
-

Encoding

TableVectorizer: -turn a pandas dataframe into a numerical array for -machine learning. Example

-

GapEncoder: -OneHotEncoder but robust to typos or non-normalized categories. Example

+
-
-
-
-

Cleaning

deduplicate(): merge -categories of similar morphology (spelling). Example

-
-
+ +
+ +
+ +
+
+

Effortless Pipelines

+

Create strong scikit-learn pipeline baselines effortlessly with + TableVectorizer + and + tabular_learner.

-
-
-