String dtype (2.3.x): avoid downcasting object to string in fillna/where/interpolate #60183

jorisvandenbossche · 2024-11-04T13:11:13Z

Noticed while backporting some test updates in #60180.

For 2.x, we have a deprecation warning about "downcasting" in methods like fillna and interpolate. For example, when starting from object dtype

>>> ser = pd.Series([1, 2, None], dtype=object)
>>> ser.fillna(3)
FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change
in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior,
set `pd.set_option('future.no_silent_downcasting', True)`
0    1
1    2
2    3
dtype: int64

However, because of how this is implemented, when you enable the future string dtype and still have an object dtype column, this will be inferred as string dtype and hence again trigger that warning:

>>> pd.options.future.infer_string = True
>>> ser = pd.Series(["a", "b", None], dtype=object)
>>> ser.fillna("c")
FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change
in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior,
set `pd.set_option('future.no_silent_downcasting', True)`
0    a
1    b
2    c
dtype: str

So this triggers a warning for something that will no longer happen in 3.0, but when testing with pd.options.future.infer_string = True, we are essentially already testing behaviour for 3.0, so raising this warning results in a warning the user won't actually see in pandas 3.0.

So I was thinking that we could explicitly not dowcast to string dtype in those methods, since for the new dtype we don't need to keep backwards compatibility with old downcasting behaviour.

This PR is directly targetting the 2.3.x branch, given that all this downcasting behaviour is already removed on main.

xref #54792

…ere/interpolate

…-downcast-string

WillAyd · 2024-11-05T15:18:27Z

pandas/_libs/lib.pyx

@@ -2741,7 +2742,11 @@ def maybe_convert_objects(ndarray[object] objects,
        seen.object_ = True

    elif seen.str_:
-        if using_string_dtype() and is_string_array(objects, skipna=True):
+        if (
+            convert_string


In what cases would we have convert_string be False and using_string_dtype be True? Not saying this implementation is wrong, just hard to grok at first glance

This combination of convert_string=False with using_string_dtype() being True is exactly what I am using inside the replace/fillna implementation, to pass down to maybe_convert_objects that I don't want to cast object to string if I know that we started with object dtype, and that this dtype should be preserved

But I fully agree it is all a bit confusing, and the implementation is not very clear (I mostly got to the current state just by getting all our tests to pass, but it is difficult to assess whether it is complete. However, given that this is about "not raising a useless warning", I think it is not the worst thing if we would have missed a case where we would still raise the warning)

Ah OK thanks that is helpful. As a nit, maybe preserve_object_dtype would be a better keyword, but not something I think is a blocker. This all needs a good cleanup past once we get through the 2.3 push anyway

As a nit, maybe preserve_object_dtype would be a better keyword

It's only specifically not converting object to string here (all other inferred types are still used), so preserve_object_dtype would be a bit too generic I think

String dtype (2.3.x): avoid downcasting object to string in fillna/wh…

d95620b

…ere/interpolate

jorisvandenbossche added this to the 2.3 milestone Nov 4, 2024

jorisvandenbossche mentioned this pull request Nov 4, 2024

[backport 2.3.x] TST (string dtype): un-xfail string tests specific to object dtype (#59433) #60180

Open

jorisvandenbossche added 4 commits November 5, 2024 09:54

still raise a warning when it would cast from numeric to string

dd0a6a2

Merge remote-tracking branch 'upstream/2.3.x' into string-dtype-2.3.x…

2e65248

…-downcast-string

update typing

850573d

try fix categorical case

60ff8f9

jorisvandenbossche marked this pull request as ready for review November 5, 2024 13:00

jorisvandenbossche requested a review from WillAyd as a code owner November 5, 2024 13:00

WillAyd reviewed Nov 5, 2024

View reviewed changes

jorisvandenbossche added Strings String extension data type and string data Downcasting labels Nov 5, 2024

jorisvandenbossche mentioned this pull request Nov 15, 2024

[backport 2.3.x] TST (string dtype): resolve xfails for frame fillna and replace tests + fix bug in replace for string (#60295) #60331

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String dtype (2.3.x): avoid downcasting object to string in fillna/where/interpolate #60183

String dtype (2.3.x): avoid downcasting object to string in fillna/where/interpolate #60183

jorisvandenbossche commented Nov 4, 2024

WillAyd Nov 5, 2024

jorisvandenbossche Nov 5, 2024

jorisvandenbossche Nov 5, 2024

WillAyd Nov 5, 2024

jorisvandenbossche Nov 5, 2024

String dtype (2.3.x): avoid downcasting object to string in fillna/where/interpolate #60183

Are you sure you want to change the base?

String dtype (2.3.x): avoid downcasting object to string in fillna/where/interpolate #60183

Conversation

jorisvandenbossche commented Nov 4, 2024

WillAyd Nov 5, 2024

Choose a reason for hiding this comment

jorisvandenbossche Nov 5, 2024

Choose a reason for hiding this comment

jorisvandenbossche Nov 5, 2024

Choose a reason for hiding this comment

WillAyd Nov 5, 2024

Choose a reason for hiding this comment

jorisvandenbossche Nov 5, 2024

Choose a reason for hiding this comment