Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT use composition in TableVectorizer #675

Closed
wants to merge 32 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
40004e0
MAINT activate common test sklearn
glemaitre Jul 18, 2023
2299f09
iter
glemaitre Jul 18, 2023
b524edd
Merge remote-tracking branch 'origin/main' into common_test
glemaitre Jul 19, 2023
57904da
TST make GapEncoder compatible with scikit-learn
glemaitre Jul 19, 2023
4d6602c
iter
glemaitre Jul 19, 2023
2f0cb58
SimilarityEncoder compat
glemaitre Jul 19, 2023
2efa4ad
DatetimeEncoder support
glemaitre Jul 19, 2023
7c379e5
iter
glemaitre Jul 20, 2023
b45f48c
iter
glemaitre Jul 20, 2023
eae158a
iter
glemaitre Jul 20, 2023
0f778b0
fix ci
glemaitre Jul 20, 2023
a3c2255
iter
glemaitre Jul 20, 2023
37c75e8
iter
glemaitre Jul 20, 2023
92087c9
iter
glemaitre Jul 20, 2023
837920f
Merge remote-tracking branch 'origin/main' into improve_table_vectorizer
glemaitre Jul 20, 2023
ba9e28b
Merge remote-tracking branch 'origin/main' into improve_table_vectorizer
glemaitre Jul 21, 2023
1ae54eb
MAINT use composition in TableVectorizer
glemaitre Jul 21, 2023
4cf8806
iter
glemaitre Jul 21, 2023
69a8082
iter
glemaitre Jul 21, 2023
e56f922
iter
glemaitre Jul 21, 2023
39c1d23
pep8
glemaitre Jul 21, 2023
2ec22f6
iter
glemaitre Jul 21, 2023
3b43b2b
iter
glemaitre Jul 21, 2023
d79cace
Merge branch 'main' of https://github.com/skrub-data/skrub into impro…
LilianBoulard Aug 18, 2023
cb8ad3b
Clean error
LilianBoulard Aug 18, 2023
6b5e6d3
remove ._columns from table_vectorizer
Vincent-Maladiere Aug 30, 2023
bfa8699
Merge branch 'main' into improve_table_vectorizer
Vincent-Maladiere Aug 30, 2023
eb793e8
fix tests because I removed 'self.columns_' earlier
Vincent-Maladiere Aug 31, 2023
a57dfa0
Merge branch 'main' into improve_table_vectorizer
Vincent-Maladiere Aug 31, 2023
5102457
add properties tests
Vincent-Maladiere Aug 31, 2023
41c5cc6
add docstring to properties
Vincent-Maladiere Sep 1, 2023
c0ef079
add get_params and set_params to enable grid_search
Vincent-Maladiere Sep 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions skrub/_table_vectorizer.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeoGrin and @glemaitre this should fix #709! This is very similar to what is done in ColumnTransformer

Original file line number Diff line number Diff line change
Expand Up @@ -875,7 +875,7 @@
X = self._auto_cast(X)

if self.verbose:
print(f"[TableVectorizer] Assigned transformers: {self._transformers}")

Check warning on line 878 in skrub/_table_vectorizer.py

View check run for this annotation

Codecov / codecov/patch

skrub/_table_vectorizer.py#L878

Added line #L878 was not covered by tests

self._column_transformer = ColumnTransformer(
transformers=self._transformers,
Expand All @@ -896,7 +896,7 @@
self.transformers_ = self._column_transformer.transformers_
for i, (name, enc, cols) in enumerate(self.transformers_):
if name == "remainder" and len(cols) < 20:
self.transformers_[i] = (

Check warning on line 899 in skrub/_table_vectorizer.py

View check run for this annotation

Codecov / codecov/patch

skrub/_table_vectorizer.py#L899

Added line #L899 was not covered by tests
name,
enc,
self.feature_names_in_[cols].tolist(),
Expand Down Expand Up @@ -948,6 +948,72 @@
"""
return self._column_transformer.get_feature_names_out()

def get_params(self, deep=True) -> dict:
"""Get parameters for this estimator.

Returns the parameters given in the constructor as well as the
estimators contained within the `specific_transformers_` of the
`TableVectorizer`.

Parameters
----------
deep : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.

Returns
-------
params : dict
Parameter names mapped to their values.
"""
return self._get_params("_specific_transformers", deep=deep)

def set_params(self, **kwargs) -> "TableVectorizer":
"""Set the parameters of this estimator.

Valid parameter keys can be listed with ``get_params()``. Note that you
can directly set the parameters of the estimators contained in
`specific_transformers_` of `TableVectorizer`.

Parameters
----------
**kwargs : dict
Estimator parameters.

Returns
-------
self : TableVectorizer
This estimator.
"""
self._set_params("_specific_transformers", **kwargs)
return self

Check warning on line 989 in skrub/_table_vectorizer.py

View check run for this annotation

Codecov / codecov/patch

skrub/_table_vectorizer.py#L988-L989

Added lines #L988 - L989 were not covered by tests

@property
def _specific_transformers(self) -> list[tuple[str, TransformerMixin]]:
"""Accessor to specific_transformers, with elements of length 2.

Internal list of specific_transformers only containing the name and
transformers, dropping the columns. This is for the implementation
of get_params via BaseComposition._get_params which expects lists
of tuples of len 2.
"""
try:

Check warning on line 1000 in skrub/_table_vectorizer.py

View check run for this annotation

Codecov / codecov/patch

skrub/_table_vectorizer.py#L1000

Added line #L1000 was not covered by tests
return [(name, trans) for name, trans, _ in self.specific_transformers]
except (TypeError, ValueError):
return self.specific_transformers

Check warning on line 1003 in skrub/_table_vectorizer.py

View check run for this annotation

Codecov / codecov/patch

skrub/_table_vectorizer.py#L1002-L1003

Added lines #L1002 - L1003 were not covered by tests

@_specific_transformers.setter
def _specific_transformers(self, value):
try:

Check warning on line 1007 in skrub/_table_vectorizer.py

View check run for this annotation

Codecov / codecov/patch

skrub/_table_vectorizer.py#L1007

Added line #L1007 was not covered by tests
self.specific_transformers = [
(name, trans, col)
for ((name, trans), (_, _, col)) in zip(
value, self.specific_transformers
)
]
except (TypeError, ValueError):
self.specific_transformers = value

Check warning on line 1015 in skrub/_table_vectorizer.py

View check run for this annotation

Codecov / codecov/patch

skrub/_table_vectorizer.py#L1014-L1015

Added lines #L1014 - L1015 were not covered by tests

@property
def named_transformers_(self) -> Bunch:
"""Map transformer names to transformer objects.
Expand Down
Loading