Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST polars support for deduplicate #785

Closed
wants to merge 4 commits into from

Conversation

TheooJ
Copy link
Contributor

@TheooJ TheooJ commented Oct 6, 2023

Closing issue #789

Checking if deduplicate works with polars input in test_deduplicate.py

I propose to catch any exceptions, let me know if you want to be more specific

@Vincent-Maladiere
Copy link
Member

@TheooJ you have to create a specific issue for the meta-issue you're addressing.

  • This PR should mention the sub-issue only (you can edit your message)
  • The sub-issue should mention the meta-issue only

Otherwise, the meta-issue will be automatically closed when we merge a single PR that refers to it.

@TheooJ
Copy link
Contributor Author

TheooJ commented Oct 10, 2023

Thanks, done.

Copy link
Member

@Vincent-Maladiere Vincent-Maladiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR. Good job using utils from test_polars! In addition to checking errors, we also want to check the results. In that sense, check that identical Pandas and Polars series inputs return the same output.

If the function you are testing already works fine with Polars, you don't need to catch a potential error. However, if this function doesn't work with Polars yet, we need to:

  1. Add it to a list of functions that don't support Polars yet (create one in test_polars.py)
  2. Use pytest.mark.xfail if the function you are testing belongs to the list. This will skip the test by displaying a soft fail (meaning: we acknowledge that this is failing, but this needs to be fixed). Example in scikit-learn. Note that scikit-learn uses a more complex scheme for xfailing tests by using _xfail_checks tags in estimators. We can keep it simple for now and use a list, without using tags.

@TheooJ
Copy link
Contributor Author

TheooJ commented Oct 11, 2023

I've compared pandas and polars outputs in the same test function. If polars is missing, the test will be skipped. If the function applied to polars either returns an error or if the pandas and polars outputs don't match, the test will xfail.

Let me know what you think

@TheooJ
Copy link
Contributor Author

TheooJ commented Oct 12, 2023

As discussed, I've removed the xfail completely if the polars test is successful.

If the test was failing, I would add

# in skrub.dataframe.tests.test_polars
XFAIL_POLARS = ["deduplicate"]
# in test_deduplicate
from skrub.dataframe.tests.test_polars import POLARS_SETUP, POLARS_MISSING_MSG, XFAIL_POLARS

@pytest.mark.xfail("deduplicate" in XFAIL_POLARS, reason="Polars not supported for deduplicate yet.")
@pytest.mark.skipif(not POLARS_SETUP, reason=POLARS_MISSING_MSG)
def test_polars_input():
    [...]

Copy link
Member

@Vincent-Maladiere Vincent-Maladiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @TheooJ, thank you for this PR! LGTM

@Vincent-Maladiere Vincent-Maladiere dismissed their stale review October 13, 2023 15:08

Let's apply suggestions from #769

@TheooJ TheooJ closed this Nov 14, 2023
This was referenced Nov 14, 2023
@TheooJ TheooJ deleted the test_deduplicate_polars branch December 20, 2023 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants