Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNOW-1649172]: Fix loc set when setting DataFrame row with Series value #2213

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-rdurrani
Copy link
Contributor

@sfc-gh-rdurrani sfc-gh-rdurrani commented Sep 3, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1649172

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
  3. Please describe how your code solves the related issue.

When doing df.loc[x] = series, an error occurs because series does not have the same number of columns as the dataframe being set. Instead, the Series should be transposed and set, regardless of whether it has an equal number of rows as the dataframe has columns.

@sfc-gh-rdurrani sfc-gh-rdurrani requested a review from a team as a code owner September 3, 2024 19:02
@sfc-gh-rdurrani sfc-gh-rdurrani added the NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs label Sep 3, 2024
@sfc-gh-rdurrani sfc-gh-rdurrani enabled auto-merge (squash) September 3, 2024 19:30
# Conflicts:
#	CHANGELOG.md
#	src/snowflake/snowpark/modin/pandas/series.py
#	tests/integ/modin/frame/test_loc.py
@sfc-gh-azhan
Copy link
Collaborator

sfc-gh-azhan commented Sep 19, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1649172

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code

      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages

    • I am adding a new telemetry message

    • I am adding new credentials

    • I am adding a new dependency

    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.

  3. Please describe how your code solves the related issue.
    Please write a short description of how your code change solves the related issue.

Please describe what is the problem.

@sfc-gh-rdurrani sfc-gh-rdurrani enabled auto-merge (squash) September 19, 2024 21:41
Copy link
Collaborator

@sfc-gh-azhan sfc-gh-azhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe what was the issue?

tests/integ/modin/frame/test_iloc.py Outdated Show resolved Hide resolved
tests/integ/modin/frame/test_loc.py Outdated Show resolved Hide resolved
src/snowflake/snowpark/modin/pandas/indexing.py Outdated Show resolved Hide resolved
original_index = index
# If `item` is from a Series (rather than a Dataframe), flip the series item values to apply them
# across columns rather than rows.
if frame_is_df_and_item_is_series and (columns == slice(None) or len(columns) > 1): # type: ignore[arg-type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you wrap it into a function and use function name to brief what this method does?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean (columns == slice(None) or len(columns) > 1)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this type: ignore[arg-type] actually indicate something is wrong. You didn't consider all type cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is checking to see if more than one column is being set. As for the arg-type, I think that is because its ignoring if the columns is a SnowflakeQueryCompiler? I've added a test for that case, and will fix it!

item, col_len, move_index_to_cols=True
)

if is_scalar(index):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if index is not scalar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If index is not scalar, we don't have to append it to the item to match index - it should either be slice(None) or an internalframe, which we handle in the rest of the method.

original_index = index
# If `item` is from a Series (rather than a Dataframe), flip the series item values to apply them
# across columns rather than rows.
if frame_is_df_and_item_is_series and (columns == slice(None) or len(columns) > 1): # type: ignore[arg-type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done in _set_2d_labels_helper_for_frame_item

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it needs to be done in this method, since we need to modify item before the map is created (which is passed into _set_2d_labels_helper_for_frame_item, and we need the modified item later on in this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move this into the conditional for if item_is_frame though!

@sfc-gh-rdurrani
Copy link
Contributor Author

Will add additional tests once the match by position or labels issue is resolved: https://snowflake.slack.com/archives/C04HF38JFAQ/p1727828020400139?thread_ts=1727824503.275869&cid=C04HF38JFAQ

CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@sfc-gh-helmeleegy sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for addressing the comments.

@@ -1954,6 +1955,82 @@ def _set_2d_labels_helper_for_single_column_wise_item(
).result_frame


def _convert_series_item_to_row_for_set_frame_2d_labels(
Copy link
Collaborator

@sfc-gh-azhan sfc-gh-azhan Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move the index operation to another helper function (or just outside of this one)? Because the name of this function didn't say anything about changing the index.

)
return end - start

if columns == slice(None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a helper function like get_column_length in indexing_util.py.

else:
col_len = len(columns.index)

if isinstance(columns, SnowflakeQueryCompiler):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add comments about what you are trying to do here and also the next line?

)

if is_scalar(index):
new_item = item.append_column("__index__", pandas_lit(index))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should use some api in SQC to set index or reindex. Manually set the column can lead to potential bugs. Once you got new_item_sqc then you can set item = new_item_sqc._modin_frame.

# across columns rather than rows.
is_multi_col_set = (
(isinstance(columns, Sized) and len(columns) > 1)
or isinstance(columns, slice)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slice and qc case can be single column right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants