[SNOW-1649172]: Fix `loc` set when setting DataFrame row with Series value #2213

sfc-gh-rdurrani · 2024-09-03T19:02:45Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-1649172
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
Please describe how your code solves the related issue.

When doing df.loc[x] = series, an error occurs because series does not have the same number of columns as the dataframe being set. Instead, the Series should be transposed and set, regardless of whether it has an equal number of rows as the dataframe has columns.

…value

# Conflicts: # CHANGELOG.md # src/snowflake/snowpark/modin/pandas/series.py # tests/integ/modin/frame/test_loc.py

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py

src/snowflake/snowpark/modin/plugin/docstrings/base.py

sfc-gh-azhan · 2024-09-19T21:17:38Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-1649172

Fill out the following pre-review checklist:

I am adding a new automated test(s) to verify correctness of my new code

If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing

I am adding new logging messages

I am adding a new telemetry message

I am adding new credentials

I am adding a new dependency

If this is a new feature/behavior, I'm adding the Local Testing parity changes.

Please describe how your code solves the related issue.
Please write a short description of how your code change solves the related issue.

Please describe what is the problem.

sfc-gh-azhan

Please describe what was the issue?

tests/integ/modin/frame/test_iloc.py

tests/integ/modin/frame/test_loc.py

src/snowflake/snowpark/modin/pandas/indexing.py

sfc-gh-azhan · 2024-09-19T21:37:32Z

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py

+    original_index = index
+    # If `item` is from a Series (rather than a Dataframe), flip the series item values to apply them
+    # across columns rather than rows.
+    if frame_is_df_and_item_is_series and (columns == slice(None) or len(columns) > 1):  # type: ignore[arg-type]


can you wrap it into a function and use function name to brief what this method does?

what does this mean (columns == slice(None) or len(columns) > 1)?

this type: ignore[arg-type] actually indicate something is wrong. You didn't consider all type cases.

I believe this is checking to see if more than one column is being set. As for the arg-type, I think that is because its ignoring if the columns is a SnowflakeQueryCompiler? I've added a test for that case, and will fix it!

sfc-gh-azhan · 2024-09-19T21:40:59Z

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py

+            item, col_len, move_index_to_cols=True
+        )
+
+        if is_scalar(index):


what happens if index is not scalar?

If index is not scalar, we don't have to append it to the item to match index - it should either be slice(None) or an internalframe, which we handle in the rest of the method.

sfc-gh-azhan · 2024-09-19T21:50:48Z

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py

+    original_index = index
+    # If `item` is from a Series (rather than a Dataframe), flip the series item values to apply them
+    # across columns rather than rows.
+    if frame_is_df_and_item_is_series and (columns == slice(None) or len(columns) > 1):  # type: ignore[arg-type]


This should be done in _set_2d_labels_helper_for_frame_item

I actually think it needs to be done in this method, since we need to modify item before the map is created (which is passed into _set_2d_labels_helper_for_frame_item, and we need the modified item later on in this method.

I can move this into the conditional for if item_is_frame though!

sfc-gh-rdurrani · 2024-10-02T00:36:41Z

Will add additional tests once the match by position or labels issue is resolved: https://snowflake.slack.com/archives/C04HF38JFAQ/p1727828020400139?thread_ts=1727824503.275869&cid=C04HF38JFAQ

CHANGELOG.md

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py

sfc-gh-helmeleegy

LGTM. Thanks for addressing the comments.

sfc-gh-azhan · 2024-10-04T22:10:52Z

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py

@@ -1954,6 +1955,82 @@ def _set_2d_labels_helper_for_single_column_wise_item(
    ).result_frame


+def _convert_series_item_to_row_for_set_frame_2d_labels(


can you move the index operation to another helper function (or just outside of this one)? Because the name of this function didn't say anything about changing the index.

sfc-gh-azhan · 2024-10-04T22:14:13Z

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py

+        )
+        return end - start
+
+    if columns == slice(None):


this can be a helper function like get_column_length in indexing_util.py.

sfc-gh-azhan · 2024-10-04T22:18:48Z

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py

+    else:
+        col_len = len(columns.index)
+
+    if isinstance(columns, SnowflakeQueryCompiler):


can you add comments about what you are trying to do here and also the next line?

sfc-gh-azhan · 2024-10-04T22:23:10Z

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py

+    )
+
+    if is_scalar(index):
+        new_item = item.append_column("__index__", pandas_lit(index))


you should use some api in SQC to set index or reindex. Manually set the column can lead to potential bugs. Once you got new_item_sqc then you can set item = new_item_sqc._modin_frame.

sfc-gh-azhan · 2024-10-04T22:26:44Z

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py

+        # across columns rather than rows.
+        is_multi_col_set = (
+            (isinstance(columns, Sized) and len(columns) > 1)
+            or isinstance(columns, slice)


slice and qc case can be single column right?

[SNOW-1649172]: Fix loc set when setting DataFrame row with Series …

3151ed7

…value

sfc-gh-rdurrani requested a review from a team as a code owner September 3, 2024 19:02

sfc-gh-rdurrani requested review from sfc-gh-lspiegelberg and sfc-gh-jkew September 3, 2024 19:02

sfc-gh-rdurrani added the NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs label Sep 3, 2024

github-actions bot added the snowpark-pandas label Sep 3, 2024

sfc-gh-rdurrani enabled auto-merge (squash) September 3, 2024 19:30

sfc-gh-rdurrani and others added 3 commits September 3, 2024 12:45

Add some more tests (including some negatives)

2bd792f

Fix tests

9e2a26d

minor changes

66f01bc

sfc-gh-vbudati disabled auto-merge September 6, 2024 20:45

sfc-gh-vbudati added 4 commits September 9, 2024 14:41

fix test

c3b9582

Merge branch 'main' into rdurrani-SNOW-1649172

a9aceb9

# Conflicts: # CHANGELOG.md # src/snowflake/snowpark/modin/pandas/series.py # tests/integ/modin/frame/test_loc.py

fix bug

c18ae1f

fix tests

c159e3a

sfc-gh-helmeleegy reviewed Sep 11, 2024

View reviewed changes

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py Outdated Show resolved Hide resolved

add example

2289960

sfc-gh-vbudati requested a review from sfc-gh-helmeleegy September 11, 2024 22:53

sfc-gh-vbudati reviewed Sep 12, 2024

View reviewed changes

src/snowflake/snowpark/modin/plugin/docstrings/base.py Show resolved Hide resolved

sfc-gh-rdurrani added 3 commits September 18, 2024 14:30

Merge branch 'main' into rdurrani-SNOW-1649172

89401a8

Merge branch 'main' into rdurrani-SNOW-1649172

25ccbb9

Merge branch 'main' into rdurrani-SNOW-1649172

8f75bec

sfc-gh-jjiao requested a review from sfc-gh-yzou September 19, 2024 16:36

Merge branch 'main' into rdurrani-SNOW-1649172

f8797d8

sfc-gh-jjiao requested a review from sfc-gh-azhan September 19, 2024 21:09

Merge branch 'main' into rdurrani-SNOW-1649172

c252eb5

sfc-gh-rdurrani enabled auto-merge (squash) September 19, 2024 21:41

sfc-gh-azhan requested changes Sep 19, 2024

View reviewed changes

Merge branch 'main' into rdurrani-SNOW-1649172

247fb02

sfc-gh-rdurrani added 3 commits October 1, 2024 17:31

Address review comments

81d8752

Merge branch 'main' into rdurrani-SNOW-1649172

ad817c4

Address potential bug

88b8f86

sfc-gh-helmeleegy reviewed Oct 2, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

sfc-gh-rdurrani added 2 commits October 3, 2024 13:15

Merge branch 'main' into rdurrani-SNOW-1649172

230a015

Add tests

8e4f3e7

sfc-gh-jjiao requested a review from sfc-gh-azhan October 3, 2024 21:34

sfc-gh-azhan reviewed Oct 3, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py Show resolved Hide resolved

sfc-gh-rdurrani added 6 commits October 3, 2024 15:25

Fix tests

6f4ba8e

Address review comments

15c48e3

Update docs

ecac5a0

Refactor into helper method

a3a1fb0

Update test coverage

6ade2b9

Merge branch 'main' into rdurrani-SNOW-1649172

13a54ab

sfc-gh-helmeleegy approved these changes Oct 4, 2024

View reviewed changes

sfc-gh-azhan reviewed Oct 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SNOW-1649172]: Fix `loc` set when setting DataFrame row with Series value #2213

[SNOW-1649172]: Fix `loc` set when setting DataFrame row with Series value #2213

sfc-gh-rdurrani commented Sep 3, 2024 •

edited

Loading

sfc-gh-azhan commented Sep 19, 2024 •

edited by jira bot

Loading

sfc-gh-azhan left a comment •

edited

Loading

sfc-gh-azhan Sep 19, 2024

sfc-gh-azhan Sep 19, 2024

sfc-gh-azhan Sep 19, 2024

sfc-gh-rdurrani Sep 27, 2024

sfc-gh-azhan Sep 19, 2024

sfc-gh-rdurrani Oct 1, 2024

sfc-gh-azhan Sep 19, 2024

sfc-gh-rdurrani Oct 1, 2024

sfc-gh-rdurrani Oct 1, 2024

sfc-gh-rdurrani commented Oct 2, 2024

sfc-gh-helmeleegy left a comment

sfc-gh-azhan Oct 4, 2024 •

edited

Loading

sfc-gh-azhan Oct 4, 2024

sfc-gh-azhan Oct 4, 2024

sfc-gh-azhan Oct 4, 2024

sfc-gh-azhan Oct 4, 2024

		@@ -1954,6 +1955,82 @@ def _set_2d_labels_helper_for_single_column_wise_item(
		).result_frame


		def _convert_series_item_to_row_for_set_frame_2d_labels(

[SNOW-1649172]: Fix loc set when setting DataFrame row with Series value #2213

Are you sure you want to change the base?

[SNOW-1649172]: Fix loc set when setting DataFrame row with Series value #2213

Conversation

sfc-gh-rdurrani commented Sep 3, 2024 • edited Loading

sfc-gh-azhan commented Sep 19, 2024 • edited by jira bot Loading

sfc-gh-azhan left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-rdurrani commented Oct 2, 2024

sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

sfc-gh-azhan Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[SNOW-1649172]: Fix `loc` set when setting DataFrame row with Series value #2213

[SNOW-1649172]: Fix `loc` set when setting DataFrame row with Series value #2213

sfc-gh-rdurrani commented Sep 3, 2024 •

edited

Loading

sfc-gh-azhan commented Sep 19, 2024 •

edited by jira bot

Loading

sfc-gh-azhan left a comment •

edited

Loading

sfc-gh-azhan Oct 4, 2024 •

edited

Loading