-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNOW-990542] When converting snowpark dataframe to pandas, cast decimal columns to float type #1201
Conversation
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
2ac08bb
to
0a59f5b
Compare
0a59f5b
to
77dab49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % nits
tests/integ/test_df_to_pandas.py
Outdated
@@ -141,6 +141,28 @@ def test_to_pandas_precision_for_number_38_0(session): | |||
assert pdf["A"].min() == -9223372036854775808 | |||
|
|||
|
|||
def test_to_pandas_precision_for_number_38_6_and_others(session): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this test to test_to_pandas_precision_for_non_zero_scale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
Please also fix the test:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are also modifying the scope of the function, we should rename _fix_pandas_df_integer
-> _fix_pandas_df_fixed_type
I'm a bit confused why I have this error... |
I'm confused as well because I saw the stacktrace has |
I have read the CLA Document and I hereby sign the CLA |
Yeah, that's my original test. I guess it's because pandas column name needs to be upper case. So I modified the column name in the test. |
@sfc-gh-aalam Do you think if we should do |
# recognize decimal type. | ||
pd_df[pandas_col_name] = pd_df[pandas_col_name].astype("float64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of a hard astype to float64, we should prefer using to_numeric downcast="float" in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, after changing to downcast="float"
, certain cases (column C
in the test) still return object
(dtype('O')
). It's unclear to me if this is expected. Do we want to force a "float64" if the column cannot be casted to float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me. What do you think? @sfc-gh-aalam
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that is the only option we have. We should run this by @sfc-gh-yixie before merging.
This reverts commit 21441fd.
# For decimal columns, we want to cast it into float64 because pandas doesn't | ||
# recognize decimal type. | ||
pandas.to_numeric(pd_df[pandas_col_name], downcast="float") | ||
if pd_df[pandas_col_name].dtype == "O": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my education, when is a pandas column dtype "0"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is O
not 0
LOL. It is when the dtype is an object.
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes #SNOW-990542
Pandas doesn't recognize decimal type. When we convert snowpark dataframe to pandas dataframe, the decimal columns are not handled correctly, and thus result in issues.
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
When we convert snowpark dataframe to pandas dataframe, we cast decimal columns into float64.
For more details, please see [thread]