-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String dtype (2.3.x): avoid downcasting object to string in fillna/where/interpolate #60183
base: 2.3.x
Are you sure you want to change the base?
String dtype (2.3.x): avoid downcasting object to string in fillna/where/interpolate #60183
Conversation
@@ -2741,7 +2742,11 @@ def maybe_convert_objects(ndarray[object] objects, | |||
seen.object_ = True | |||
|
|||
elif seen.str_: | |||
if using_string_dtype() and is_string_array(objects, skipna=True): | |||
if ( | |||
convert_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what cases would we have convert_string
be False and using_string_dtype
be True? Not saying this implementation is wrong, just hard to grok at first glance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This combination of convert_string=False
with using_string_dtype()
being True is exactly what I am using inside the replace/fillna implementation, to pass down to maybe_convert_objects
that I don't want to cast object to string if I know that we started with object dtype, and that this dtype should be preserved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I fully agree it is all a bit confusing, and the implementation is not very clear (I mostly got to the current state just by getting all our tests to pass, but it is difficult to assess whether it is complete. However, given that this is about "not raising a useless warning", I think it is not the worst thing if we would have missed a case where we would still raise the warning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK thanks that is helpful. As a nit, maybe preserve_object_dtype
would be a better keyword, but not something I think is a blocker. This all needs a good cleanup past once we get through the 2.3 push anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a nit, maybe
preserve_object_dtype
would be a better keyword
It's only specifically not converting object to string here (all other inferred types are still used), so preserve_object_dtype
would be a bit too generic I think
Noticed while backporting some test updates in #60180.
For 2.x, we have a deprecation warning about "downcasting" in methods like fillna and interpolate. For example, when starting from object dtype
However, because of how this is implemented, when you enable the future string dtype and still have an object dtype column, this will be inferred as string dtype and hence again trigger that warning:
So this triggers a warning for something that will no longer happen in 3.0, but when testing with
pd.options.future.infer_string = True
, we are essentially already testing behaviour for 3.0, so raising this warning results in a warning the user won't actually see in pandas 3.0.So I was thinking that we could explicitly not dowcast to string dtype in those methods, since for the new dtype we don't need to keep backwards compatibility with old downcasting behaviour.
This PR is directly targetting the
2.3.x
branch, given that all this downcasting behaviour is already removed on main.xref #54792