[Local Testing] SNOW-904261 Support DataFrame.except_ #1076

sfc-gh-stan · 2023-10-04T18:46:36Z

Please answer these questions before submitting your pull requests. Thanks!

What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes #NNNN
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
Please describe how your code solves the related issue.

Please write a short description of how your code change solves the related issue.

sfc-gh-stan · 2023-10-04T20:17:54Z

tests/integ/scala/test_dataframe_set_operations_suite.py

@@ -411,13 +403,13 @@ def test_project_should_not_be_pushed_down_through_intersect_or_except(session):
    assert df1.except_(df2).count() == 70


-# TODO: Fix this, `MockExecutionPlan.attributes` are ignoring nullability for now


Forgot to mark it as localtestable.

sfc-gh-sfan · 2023-10-04T23:43:58Z

src/snowflake/snowpark/_internal/type_utils.py

        fields = [
            StructField(
                f.name,
                merge_type(
                    f.datatype,
-                    nfs.get(f.name, NullType()),
+                    name_to_datatype_b.get(f.name, NullType()),


What if the data type is not nullable in b?

If f.name does not exist in name_to_datatype_b, it shouldn't exist in name_to_nullable_b either, which means nullable will be set to True.

sfc-gh-sfan · 2023-10-04T23:46:27Z

src/snowflake/snowpark/mock/plan.py

+                # Dedup all none rows
+                if res_df.isnull().all(axis=1).where(lambda x: x).count() > 1:
+                    res_df = res_df.drop(index=res_df.isnull().all(axis=1).index[1:])
+                # If there are all none rows in cur_df, drop all none rows in res_df


I don't think I understand this comment. Could you elaborate it?

This is because in pandas NaN == NaN evaluates to False. So NOT IS IN won't drop rows that are all NaN/None (all-none-row's) , neither does drop_duplicates remove those rows even if there are multiple.

sfc-gh-stan · 2023-10-05T23:28:27Z

src/snowflake/snowpark/mock/plan.py

-                    bool(any([bool(item is None) for item in result[c]])),
+                    result[
+                        c
+                    ].sf_type.nullable,  # bool(any([bool(item is None) for item in result[c]]))


TODO: remove the trailing comment.

sfc-gh-stan · 2023-10-06T18:25:38Z

I cleaned up the code and made some refactor in #1077 , merging this first.

Add changes

c18322d

sfc-gh-stan commented Oct 4, 2023

View reviewed changes

sfc-gh-stan marked this pull request as ready for review October 4, 2023 20:24

sfc-gh-stan requested a review from a team as a code owner October 4, 2023 20:24

sfc-gh-stan requested review from sfc-gh-aling and sfc-gh-aalam and removed request for a team October 4, 2023 20:24

sfc-gh-sfan reviewed Oct 4, 2023

View reviewed changes

sfc-gh-stan commented Oct 5, 2023

View reviewed changes

sfc-gh-sfan approved these changes Oct 6, 2023

View reviewed changes

sfc-gh-stan merged commit 61e0497 into dev/local-testing Oct 6, 2023

sfc-gh-stan deleted the local/support-except branch October 6, 2023 18:25

github-actions bot locked and limited conversation to collaborators Oct 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Local Testing] SNOW-904261 Support DataFrame.except_ #1076

[Local Testing] SNOW-904261 Support DataFrame.except_ #1076

sfc-gh-stan commented Oct 4, 2023

sfc-gh-stan Oct 4, 2023

sfc-gh-sfan Oct 4, 2023

sfc-gh-stan Oct 5, 2023

sfc-gh-sfan Oct 4, 2023

sfc-gh-stan Oct 5, 2023

sfc-gh-stan Oct 5, 2023

sfc-gh-stan commented Oct 6, 2023

		@@ -411,13 +403,13 @@ def test_project_should_not_be_pushed_down_through_intersect_or_except(session):
		assert df1.except_(df2).count() == 70


		# TODO: Fix this, `MockExecutionPlan.attributes` are ignoring nullability for now

[Local Testing] SNOW-904261 Support DataFrame.except_ #1076

[Local Testing] SNOW-904261 Support DataFrame.except_ #1076

Conversation

sfc-gh-stan commented Oct 4, 2023

sfc-gh-stan Oct 4, 2023

Choose a reason for hiding this comment

sfc-gh-sfan Oct 4, 2023

Choose a reason for hiding this comment

sfc-gh-stan Oct 5, 2023

Choose a reason for hiding this comment

sfc-gh-sfan Oct 4, 2023

Choose a reason for hiding this comment

sfc-gh-stan Oct 5, 2023

Choose a reason for hiding this comment

sfc-gh-stan Oct 5, 2023

Choose a reason for hiding this comment

sfc-gh-stan commented Oct 6, 2023