Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Local Testing] SNOW-904261 Support DataFrame.except_ #1076

Merged
merged 1 commit into from
Oct 6, 2023

Conversation

sfc-gh-stan
Copy link
Collaborator

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes #NNNN

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

@@ -411,13 +403,13 @@ def test_project_should_not_be_pushed_down_through_intersect_or_except(session):
assert df1.except_(df2).count() == 70


# TODO: Fix this, `MockExecutionPlan.attributes` are ignoring nullability for now
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mark it as localtestable.

@sfc-gh-stan sfc-gh-stan marked this pull request as ready for review October 4, 2023 20:24
@sfc-gh-stan sfc-gh-stan requested a review from a team as a code owner October 4, 2023 20:24
@sfc-gh-stan sfc-gh-stan requested review from sfc-gh-aling and sfc-gh-aalam and removed request for a team October 4, 2023 20:24
fields = [
StructField(
f.name,
merge_type(
f.datatype,
nfs.get(f.name, NullType()),
name_to_datatype_b.get(f.name, NullType()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the data type is not nullable in b?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If f.name does not exist in name_to_datatype_b, it shouldn't exist in name_to_nullable_b either, which means nullable will be set to True.

# Dedup all none rows
if res_df.isnull().all(axis=1).where(lambda x: x).count() > 1:
res_df = res_df.drop(index=res_df.isnull().all(axis=1).index[1:])
# If there are all none rows in cur_df, drop all none rows in res_df
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this comment. Could you elaborate it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because in pandas NaN == NaN evaluates to False. So NOT IS IN won't drop rows that are all NaN/None (all-none-row's) , neither does drop_duplicates remove those rows even if there are multiple.

bool(any([bool(item is None) for item in result[c]])),
result[
c
].sf_type.nullable, # bool(any([bool(item is None) for item in result[c]]))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove the trailing comment.

@sfc-gh-stan
Copy link
Collaborator Author

I cleaned up the code and made some refactor in #1077 , merging this first.

@sfc-gh-stan sfc-gh-stan merged commit 61e0497 into dev/local-testing Oct 6, 2023
@sfc-gh-stan sfc-gh-stan deleted the local/support-except branch October 6, 2023 18:25
@github-actions github-actions bot locked and limited conversation to collaborators Oct 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants