Add RelationalGroupedDataFrame.pivot() #1130

sfc-gh-aalam · 2023-11-05T23:27:33Z

Please answer these questions before submitting your pull requests. Thanks!

What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-944062: Implementation and functionality of pivot differs from PySpark and is not user-friendly #1093
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
Please describe how your code solves the related issue.

Added apivot method in RelationalGroupedDataFrame which allows to access pivot using df.group_by().pivot()

sfc-gh-aalam · 2023-11-05T23:33:04Z

src/snowflake/snowpark/relational_grouped_dataframe.py

+        if values is None:
+            distinct_values = (
+                self._df.select(pivot_col).distinct()._internal_collect_with_tag()
+            )
+            value_exprs = [Literal(v[0]) for v in distinct_values]


should we add this in Dataframe.pivot as well with a documentation update saying this method is not recommended?

sfc-gh-sfan · 2023-11-06T17:03:29Z

src/snowflake/snowpark/relational_grouped_dataframe.py

+            >>> create_result = session.sql('''create or replace temp table monthly_sales(empid int, team text, amount int, month text)
+            ... as select * from values
+            ... (1, 'A', 10000, 'JAN'),
+            ... (1, 'B', 400, 'JAN'),
+            ... (2, 'A', 4500, 'JAN'),
+            ... (2, 'A', 35000, 'JAN'),
+            ... (1, 'B', 5000, 'FEB'),
+            ... (1, 'A', 3000, 'FEB'),
+            ... (2, 'B', 200, 'FEB') ''').collect()


I wonder if we could use create_dataframe then save_as_table to represent this session.sql? Maybe it is not worth it but just thinking out loud.

sfc-gh-sfan · 2023-11-06T17:07:29Z

src/snowflake/snowpark/relational_grouped_dataframe.py

+            distinct_values = (
+                self._df.select(pivot_col).distinct()._internal_collect_with_tag()
+            )


This is actually a blocking call, and might not be cheap. I wonder if we should highlight why this is not efficient in the documentation, in addition to saying this is not recommended.

Alternatively, should we lazy evaluate this query (put them into the query plan), or at least do a async query here and fetch the result only when the value is required?

sfc-gh-sfan

Let's make sure we have a TODO for distinct values

sfc-gh-sfan

Do we need to raise if values is None?

sfc-gh-aalam · 2023-11-10T18:52:28Z

Let's make sure we have a TODO for distinct values

https://snowflakecomputing.atlassian.net/browse/SNOW-967385

sfc-gh-aalam · 2023-11-10T18:53:20Z

Do we need to raise if values is None?

Right now, df.pivot doesn't raise it but we could raise a ValueError in both places saying values cannot be empty. Right now we get NoneType is not iterable

sfc-gh-sfan · 2023-11-11T00:36:12Z

I see. Both works, probably an explicit error message is easier for user to understand :)

…m-py-spark

src/snowflake/snowpark/dataframe.py

src/snowflake/snowpark/_internal/analyzer/analyzer.py

src/snowflake/snowpark/relational_grouped_dataframe.py

sfc-gh-mkeller · 2023-11-16T18:44:25Z

src/snowflake/snowpark/relational_grouped_dataframe.py

+            v._expression if isinstance(v, Column) else Literal(v) for v in values
+        ]
+        self._group_type = _PivotType(pc[0], value_exprs)
+        return self


You could have the return-type be typing_extensions.Self here

sfc-gh-mkeller

🚢 🚢

Add RelationalGroupedDataFrame.pivot()

b9c291e

sfc-gh-aalam commented Nov 5, 2023

View reviewed changes

add comments to make code more readable

3ca2489

sfc-gh-aalam marked this pull request as ready for review November 5, 2023 23:34

sfc-gh-aalam requested a review from a team as a code owner November 5, 2023 23:34

sfc-gh-aalam requested review from sfc-gh-yixie and sfc-gh-evandenberg November 5, 2023 23:34

sfc-gh-sfan reviewed Nov 6, 2023

View reviewed changes

Do not infer distinct values; follow-up later

bbccad5

sfc-gh-sfan approved these changes Nov 10, 2023

View reviewed changes

sfc-gh-sfan reviewed Nov 10, 2023

View reviewed changes

sfc-gh-aalam added 3 commits November 12, 2023 17:42

add value error; fix lint

683cd03

Merge branch 'main' into aalam-SNOW-944062-usage-of-pivot-differs-fro…

b104175

…m-py-spark

Merge branch 'main' into aalam-SNOW-944062-usage-of-pivot-differs-fro…

ce1ebe6

…m-py-spark

sfc-gh-mkeller requested changes Nov 16, 2023

View reviewed changes

sfc-gh-aalam added 4 commits November 16, 2023 13:56

fix mypy

3495c79

fix error message

0f63ec3

fix mypy

6e06ffc

fix mypy

cac3be1

sfc-gh-mkeller approved these changes Nov 17, 2023

View reviewed changes

sfc-gh-aalam merged commit da4be4a into main Nov 17, 2023
51 checks passed

sfc-gh-aalam deleted the aalam-SNOW-944062-usage-of-pivot-differs-from-py-spark branch November 17, 2023 17:38

github-actions bot locked and limited conversation to collaborators Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RelationalGroupedDataFrame.pivot() #1130

Add RelationalGroupedDataFrame.pivot() #1130

sfc-gh-aalam commented Nov 5, 2023

sfc-gh-aalam Nov 5, 2023

sfc-gh-sfan Nov 6, 2023

sfc-gh-sfan Nov 6, 2023

sfc-gh-sfan left a comment

sfc-gh-sfan left a comment

sfc-gh-aalam commented Nov 10, 2023

sfc-gh-aalam commented Nov 10, 2023

sfc-gh-sfan commented Nov 11, 2023

sfc-gh-mkeller Nov 16, 2023

sfc-gh-mkeller left a comment

Add RelationalGroupedDataFrame.pivot() #1130

Add RelationalGroupedDataFrame.pivot() #1130

Conversation

sfc-gh-aalam commented Nov 5, 2023

sfc-gh-aalam Nov 5, 2023

Choose a reason for hiding this comment

sfc-gh-sfan Nov 6, 2023

Choose a reason for hiding this comment

sfc-gh-sfan Nov 6, 2023

Choose a reason for hiding this comment

sfc-gh-sfan left a comment

Choose a reason for hiding this comment

sfc-gh-sfan left a comment

Choose a reason for hiding this comment

sfc-gh-aalam commented Nov 10, 2023

sfc-gh-aalam commented Nov 10, 2023

sfc-gh-sfan commented Nov 11, 2023

sfc-gh-mkeller Nov 16, 2023

Choose a reason for hiding this comment

sfc-gh-mkeller left a comment

Choose a reason for hiding this comment