Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA add link for HTML representation #1051

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

glemaitre
Copy link
Member

This PR add the necessary class and function to add a link to the documentation when showing the HTML representation of an estimator. Unfortunately, there is a bug in scikit-learn that will be corrected in 1.6 and we need to vendor the fix and inherit from the mixin.

@glemaitre glemaitre marked this pull request as draft September 2, 2024 21:54
@jeromedockes
Copy link
Member

if the bug is in scikit-learn and this is not critical, could we just wait for it to be fix in scikit-learn without vendoring the htmlmixin class? also once it is fixed in scikit-learn would we still need to inherit from a private scikit-learn class?

@glemaitre
Copy link
Member Author

could we just wait for it to be fix in scikit-learn without vendoring the htmlmixin class?

Yes but we need to bump the minimum version

also once it is fixed in scikit-learn would we still need to inherit from a private scikit-learn class?

Actually, we don't need to. We only need to set the class attributes. BaseEstimator is inheriting from this private mixin.

@Vincent-Maladiere
Copy link
Member

Yes but we need to bump the minimum version

This is somewhat unrelated to this PR, but we were thinking with @jeromedockes whether or not we should try to support as many Python and scikit-learn versions as possible, for many potential users are often stuck with an old Python version (e.g. APHP). We can talk about this idea somewhere else, though.

@glemaitre
Copy link
Member Author

@Vincent-Maladiere I share the same feeling and this is the reason why to vendor for the moment as a fix.

Copy link
Member

@jeromedockes jeromedockes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the fix a bit too intrusive for having early access to a scikit-learn feature that doesn't affect skrub itself, but OTOH I completely agree the functionality is useful 🤔

@@ -0,0 +1,103 @@
import itertools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we name this module something more explicit like "_sklearn_html_repr_utils" or similar?

# TODO: subsequently, we should remove the inheritance from _HTMLDocumentationLinkMixin
# for each estimator then.
if sklearn_version > parse_version("1.6"):
from sklearn.utils._estimator_html_repr import _HTMLDocumentationLinkMixin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we will now have 2 of those in every skrub estimator's parents, one directly and one through the BaseEstimator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. The one on the left in the inheritance will be the one used.

@jeromedockes
Copy link
Member

@glemaitre could you do a

git commit --allow-empty -m '[doc build]' && git push

so we can see the new TableVectorizer display in the examples? thanks!

@@ -101,3 +101,9 @@ def doc_link_url_param_generator(estimator):
"estimator_module": estimator_module,
"estimator_name": estimator_name,
}


class _SkrubHTMLDocumentationLinkMixin(_HTMLDocumentationLinkMixin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even when the scikit-learn fix is released, we will most likely want this mixin to avoid repeating these 3 attributes everywhere. so in the end all we will have to remove is the HTMLDocumentationLinkMixin which is confined to this module.

so after all I am +1 for this PR

@Vincent-Maladiere
Copy link
Member

@glemaitre ok, I misread your first message. I'm +0 or +1 as long as we don't need to bump the minimal version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants