TST polars support for deduplicate #785

TheooJ · 2023-10-06T16:15:52Z

Closing issue #789

Checking if deduplicate works with polars input in test_deduplicate.py

I propose to catch any exceptions, let me know if you want to be more specific

Vincent-Maladiere · 2023-10-10T10:41:34Z

@TheooJ you have to create a specific issue for the meta-issue you're addressing.

This PR should mention the sub-issue only (you can edit your message)
The sub-issue should mention the meta-issue only

Otherwise, the meta-issue will be automatically closed when we merge a single PR that refers to it.

TheooJ · 2023-10-10T12:59:26Z

Thanks, done.

Vincent-Maladiere

Thanks for this PR. Good job using utils from test_polars! In addition to checking errors, we also want to check the results. In that sense, check that identical Pandas and Polars series inputs return the same output.

If the function you are testing already works fine with Polars, you don't need to catch a potential error. However, if this function doesn't work with Polars yet, we need to:

Add it to a list of functions that don't support Polars yet (create one in test_polars.py)
Use pytest.mark.xfail if the function you are testing belongs to the list. This will skip the test by displaying a soft fail (meaning: we acknowledge that this is failing, but this needs to be fixed). Example in scikit-learn. Note that scikit-learn uses a more complex scheme for xfailing tests by using _xfail_checks tags in estimators. We can keep it simple for now and use a list, without using tags.

TheooJ · 2023-10-11T12:08:19Z

I've compared pandas and polars outputs in the same test function. If polars is missing, the test will be skipped. If the function applied to polars either returns an error or if the pandas and polars outputs don't match, the test will xfail.

Let me know what you think

TheooJ · 2023-10-12T08:32:50Z

As discussed, I've removed the xfail completely if the polars test is successful.

If the test was failing, I would add

# in skrub.dataframe.tests.test_polars
XFAIL_POLARS = ["deduplicate"]

# in test_deduplicate
from skrub.dataframe.tests.test_polars import POLARS_SETUP, POLARS_MISSING_MSG, XFAIL_POLARS

@pytest.mark.xfail("deduplicate" in XFAIL_POLARS, reason="Polars not supported for deduplicate yet.")
@pytest.mark.skipif(not POLARS_SETUP, reason=POLARS_MISSING_MSG)
def test_polars_input():
    [...]

Vincent-Maladiere

Hey @TheooJ, thank you for this PR! LGTM

Let's apply suggestions from #769

Test polars support for deduplicate

70140ff

Vincent-Maladiere requested changes Oct 10, 2023

View reviewed changes

TheooJ added 2 commits October 11, 2023 12:10

Merge branch 'main' into test_deduplicate_polars

9ecd2d3

Compare pandas and polars outputs

f82c076

Remove xfail if test is successful

79341ed

Vincent-Maladiere previously approved these changes Oct 12, 2023

View reviewed changes

TheooJ closed this Nov 14, 2023

This was referenced Nov 14, 2023

Test polars support #825

Closed

Test polars support #826

Merged

TheooJ deleted the test_deduplicate_polars branch December 20, 2023 23:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST polars support for deduplicate #785

TST polars support for deduplicate #785

TheooJ commented Oct 6, 2023 •

edited

Loading

Vincent-Maladiere commented Oct 10, 2023

TheooJ commented Oct 10, 2023

Vincent-Maladiere left a comment

TheooJ commented Oct 11, 2023 •

edited

Loading

TheooJ commented Oct 12, 2023 •

edited

Loading

Vincent-Maladiere left a comment

TST polars support for deduplicate #785

TST polars support for deduplicate #785

Conversation

TheooJ commented Oct 6, 2023 • edited Loading

Vincent-Maladiere commented Oct 10, 2023

TheooJ commented Oct 10, 2023

Vincent-Maladiere left a comment

Choose a reason for hiding this comment

TheooJ commented Oct 11, 2023 • edited Loading

TheooJ commented Oct 12, 2023 • edited Loading

Vincent-Maladiere left a comment

Choose a reason for hiding this comment

TheooJ commented Oct 6, 2023 •

edited

Loading

TheooJ commented Oct 11, 2023 •

edited

Loading

TheooJ commented Oct 12, 2023 •

edited

Loading