-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNOW-1649172]: Fix loc
set when setting DataFrame row with Series value
#2213
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # CHANGELOG.md # src/snowflake/snowpark/modin/pandas/series.py # tests/integ/modin/frame/test_loc.py
src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py
Outdated
Show resolved
Hide resolved
Please describe what is the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe what was the issue?
original_index = index | ||
# If `item` is from a Series (rather than a Dataframe), flip the series item values to apply them | ||
# across columns rather than rows. | ||
if frame_is_df_and_item_is_series and (columns == slice(None) or len(columns) > 1): # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you wrap it into a function and use function name to brief what this method does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean (columns == slice(None) or len(columns) > 1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this type: ignore[arg-type]
actually indicate something is wrong. You didn't consider all type cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is checking to see if more than one column is being set. As for the arg-type, I think that is because its ignoring if the columns is a SnowflakeQueryCompiler? I've added a test for that case, and will fix it!
item, col_len, move_index_to_cols=True | ||
) | ||
|
||
if is_scalar(index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if index is not scalar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If index is not scalar, we don't have to append it to the item to match index - it should either be slice(None) or an internalframe, which we handle in the rest of the method.
original_index = index | ||
# If `item` is from a Series (rather than a Dataframe), flip the series item values to apply them | ||
# across columns rather than rows. | ||
if frame_is_df_and_item_is_series and (columns == slice(None) or len(columns) > 1): # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done in _set_2d_labels_helper_for_frame_item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think it needs to be done in this method, since we need to modify item
before the map is created (which is passed into _set_2d_labels_helper_for_frame_item
, and we need the modified item later on in this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move this into the conditional for if item_is_frame
though!
Will add additional tests once the match by position or labels issue is resolved: https://snowflake.slack.com/archives/C04HF38JFAQ/p1727828020400139?thread_ts=1727824503.275869&cid=C04HF38JFAQ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for addressing the comments.
@@ -1954,6 +1955,82 @@ def _set_2d_labels_helper_for_single_column_wise_item( | |||
).result_frame | |||
|
|||
|
|||
def _convert_series_item_to_row_for_set_frame_2d_labels( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move the index operation to another helper function (or just outside of this one)? Because the name of this function didn't say anything about changing the index.
) | ||
return end - start | ||
|
||
if columns == slice(None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be a helper function like get_column_length
in indexing_util.py.
else: | ||
col_len = len(columns.index) | ||
|
||
if isinstance(columns, SnowflakeQueryCompiler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add comments about what you are trying to do here and also the next line?
) | ||
|
||
if is_scalar(index): | ||
new_item = item.append_column("__index__", pandas_lit(index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should use some api in SQC to set index or reindex. Manually set the column can lead to potential bugs. Once you got new_item_sqc then you can set item = new_item_sqc._modin_frame.
# across columns rather than rows. | ||
is_multi_col_set = ( | ||
(isinstance(columns, Sized) and len(columns) > 1) | ||
or isinstance(columns, slice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slice and qc case can be single column right?
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1649172
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
When doing
df.loc[x] = series
, an error occurs because series does not have the same number of columns as the dataframe being set. Instead, the Series should be transposed and set, regardless of whether it has an equal number of rows as the dataframe has columns.