subdtype #247

andrewgsavage · 2024-08-04T10:47:38Z

Closes Values dtype #192 float("nan") not always converted to pd.NA inside series with pint dtype #238 Constructor path unexpectedly changes outcome #205 Eqaulity checks when magnitudes have different dtypes #218
Executed pre-commit run --all-files with no errors
The change is fully covered by automated unit tests
Documented in docs/ as appropriate
Added an entry to the CHANGES file

andrewgsavage · 2024-08-05T10:01:13Z

This is ready for looking at now.
@MichaelTiemannOSC I think this is necessary to close the issues listed at the top. There aren't as many changes to the code as I'd expected. Also most initialisation paths are unaffected (only change is when constructing through Series or DataFrame, the subdtype needs to be specified or it'll default to Float64) see https://github.com/hgrecco/pint-pandas/pull/247/files#diff-ec91e8b7897fd22a3f890861665f6e7035e8a8a2dadd216dfeb998fa85290249

MichaelTiemannOSC · 2024-08-13T01:38:10Z

pint_pandas/pint_array.py

        if isinstance(values, _Quantity):
            values = values.to(dtype.units).magnitude
        elif isinstance(values, PintArray):
            values = values._data
-        if isinstance(values, np.ndarray):
-            dtype = values.dtype
-            if dtype in ddtypemap:


As mentioned after the fact with respect to #137, this particular type of dtype handling caused me some grief. Now that I've sorted out what sort of grief it caused me, I'll be ready to review these changes more properly.

it may be possible to remove ddtypemap with this change. the only other place it's used is in to_numpy which could defer to the underlying ExtensionArray

MichaelTiemannOSC · 2024-08-13T09:38:34Z

There are two ways that these changes mess with my world. The first is when I craft an ndarray of things that contain np.nan values and they get converted to NA values (because Float64). The second is when I try to construct arrays with uncertainty values (uncertainty Variables cannot be converted to Float64).

In both cases, I run into this when I use astype("pint[units]") and I need to use either astype("pint[units][float64]") or astype("pint[units][object]") when dealing with pure floats or uncertainties, respectively.

andrewgsavage · 2024-08-13T21:02:09Z

note you can change the default from Float64 to float64 or any other dtype pandas can parse by changing `pint_pandas.pint_array.DEFAULT_SUBDTYPE = "Float64"

pandas devs were surprised we are using float64 and not Float64- which is because it hasn't been changed since early on!

I think those changes to construction are good as they'll make it more unamibgious what the underlying data is.

The astype point is something I hadn't considered. That's more verbose and means code must change depending on the underlying dtype. I think DataFrame.pint.to("units") could be added as an alternative (and is 5 fewer characters, a double win!) That's been asked for elsewhere. I'll do that as a seperate PR.

andrewgsavage · 2024-08-13T21:04:31Z

There's a few things on the testing side to change/look at:

check tests run with other dtypes aren't getting converted to floats
add tests mixing dtypes

MichaelTiemannOSC · 2024-08-13T21:29:46Z

note you can change the default from Float64 to float64 or any other dtype pandas can parse by changing `pint_pandas.pint_array.DEFAULT_SUBDTYPE = "Float64"

I'm an age-old C programmer, and this is how I think. I but I know it's become more fashionable to use setters and getters to affect what might be a private variable. But yes, I can change the default to "float64" for my world, and then I can simplify my try statements to use the default and fallback to [object] when using uncertainties are present.

MichaelTiemannOSC · 2024-08-14T08:50:09Z

pint_pandas/pint_array.py

@@ -859,10 +890,17 @@ def _create_comparison_method(cls, op):
        return cls._create_method(op, coerce_to_dtype=False)

    @classmethod
-    def from_1darray_quantity(cls, quantity):
+    def from_1darray_quantity(cls, quantity, subdtype=None):


The addition of subdtype here is noted, but none of the callers of from_1darray_quantity make use of it. When I have two PintArrays (of 'EUR') for example, and I construct them as pint[EUR][float64] and then divide them to get a dimensionless ratio, the [float64] nature goes away and I get back a pint[dimensionless][Float64], which I don't want. The question is: should we fix all the callers of this function to pass the subdtype of the quantity as the subdtype value, and should that override the call on line 900 to pass the incoming subdtype of the pd.array call? Or should we fix this function to preserve the subdtype of the quantity.magnitude if the incoming subdtype is None and pass that on line 900 rather than rederiving it on line 901?

MichaelTiemannOSC · 2024-08-26T13:23:55Z

The latest changes still cause problems. At line 875 (function _binop, with coerce_to_dtype being True), we call cls.from_1darray_quantity(res). When res is a ndarray of float64, this takes us to line 900, which the call to pd.array returns as Float64 (pandas's preferred subdtype) rather than float64 (my program's preferred subdtype). A hack would be to test the special case in which we have an ndarray of float64 and if that is the preferred subdtype, to preserve that. Otherwise, let pd.array do what it does.

andrewgsavage · 2024-08-26T14:22:38Z

Ah I meant to go back and check that line. Does it behave correctly for you now?

MichaelTiemannOSC · 2024-08-26T14:36:01Z

pint_pandas/pint_array.py

@@ -872,7 +872,7 @@ def convert_values(param):

            if coerce_to_dtype:
                try:
-                    res = cls.from_1darray_quantity(res)
+                    res = cls.from_1darray_quantity(res, self.dtype.subdtype)


Passing self.dtype.subdtype here doesn't fix the problem that at line 901 pd.array can convert a float64 array into a Float64 array and line 902 will then take the subdtype from that, ignoring the subdtype passed in.

could you give a worked example? sounds like you've got a Quantity( NumpyExtensionArray([1.0, ...]) and I'm not sure how that's happened

Here's a worked example:

import pandas as pd import numpy as np import pint_pandas xx = pd.Series([1.0, 2.0, np.nan], dtype="pint[km][float64]") yy = pd.Series([2.0, 3.0, 4.0], dtype="pint[km][float64]") print(xx.fillna(123.456)) print(xx.fillna(xx + yy))

The conversion of xx+yy from pint[km][float64] to pint[km][Float64] makes it unsuitable as a further source to a fillna operation.

MichaelTiemannOSC · 2024-08-28T21:23:41Z

pint_pandas/pint_array.py

@@ -864,15 +864,22 @@ def convert_values(param):
            # a TypeError should be raised
            res = op(lvalues, rvalues)

+            subdtype = self.dtype.subdtype
+            if "truediv" in op.__name__ and subdtype.kind == "i":


In this case self can be:

<PintArray> [1581.2004963851048, 2322.7768972177428, 2065.2759837278822, 2108.0321074762774, 1922.714515192993] Length: 5, dtype: pint[USD * percent][float64]

self.dtype is pint[USD * percent][float64] and self.dtype.subdtype is 'float64'. But the string 'float64' does not have a kind attribute.

scanzy · 2024-09-16T08:38:22Z

Hey thank you for all your work on #247 !

So Float64 will be the new default sub-dtype?
However I find it really useful to have the ability to specify "pint[MW][Float64]" or "pint[MW][float64]" :)

subdtype

9e47e7a

andrewgsavage marked this pull request as draft August 4, 2024 10:47

andrewgsavage added 11 commits August 4, 2024 11:48

lint

09eb285

lint

8ec278e

tests

0e3a269

get tests working

f3d8f58

get tests working

a966311

get tests working

91da320

get tests working

06f64d3

get tests working

c605913

get tests working

320901a

docs

2c6a2a4

docs

c93407d

andrewgsavage requested a review from MichaelTiemannOSC August 5, 2024 09:51

andrewgsavage marked this pull request as ready for review August 5, 2024 09:51

Trigger Build

7939a26

docs

324f73b

andrewgsavage changed the title ~~[WIP] subdtype~~ subdtype Aug 5, 2024

MichaelTiemannOSC reviewed Aug 13, 2024

View reviewed changes

MichaelTiemannOSC reviewed Aug 14, 2024

View reviewed changes

andrewgsavage referenced this pull request Aug 26, 2024

137 pandas eval (#239)

899dd26

andrewgsavage added 3 commits August 26, 2024 10:04

1darray subdtype

c275f34

1darray subdtype

e385a0d

test more dtypes

1dcc66d

subdtypes coerce

6180e2a

MichaelTiemannOSC reviewed Aug 26, 2024

View reviewed changes

andrewgsavage added 2 commits August 27, 2024 20:04

subdtype

99b4c86

subdtype

bc66a5e

MichaelTiemannOSC reviewed Aug 28, 2024

View reviewed changes

andrewgsavage added 3 commits August 29, 2024 20:18

subdtype

0540dbe

subdtype

edb8567

subdtype

5eeb788

andrewgsavage mentioned this pull request Nov 11, 2024

clip fails on NA/nan values #255

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subdtype #247

subdtype #247

andrewgsavage commented Aug 4, 2024 •

edited

Loading

andrewgsavage commented Aug 5, 2024

MichaelTiemannOSC Aug 13, 2024

andrewgsavage Aug 13, 2024

MichaelTiemannOSC commented Aug 13, 2024

andrewgsavage commented Aug 13, 2024

andrewgsavage commented Aug 13, 2024

MichaelTiemannOSC commented Aug 13, 2024

MichaelTiemannOSC Aug 14, 2024 •

edited

Loading

MichaelTiemannOSC commented Aug 26, 2024

andrewgsavage commented Aug 26, 2024

MichaelTiemannOSC Aug 26, 2024

andrewgsavage Aug 26, 2024

MichaelTiemannOSC Aug 26, 2024

MichaelTiemannOSC Aug 28, 2024

scanzy commented Sep 16, 2024 •

edited

Loading

subdtype #247

Are you sure you want to change the base?

subdtype #247

Conversation

andrewgsavage commented Aug 4, 2024 • edited Loading

andrewgsavage commented Aug 5, 2024

MichaelTiemannOSC Aug 13, 2024

Choose a reason for hiding this comment

andrewgsavage Aug 13, 2024

Choose a reason for hiding this comment

MichaelTiemannOSC commented Aug 13, 2024

andrewgsavage commented Aug 13, 2024

andrewgsavage commented Aug 13, 2024

MichaelTiemannOSC commented Aug 13, 2024

MichaelTiemannOSC Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

MichaelTiemannOSC commented Aug 26, 2024

andrewgsavage commented Aug 26, 2024

MichaelTiemannOSC Aug 26, 2024

Choose a reason for hiding this comment

andrewgsavage Aug 26, 2024

Choose a reason for hiding this comment

MichaelTiemannOSC Aug 26, 2024

Choose a reason for hiding this comment

MichaelTiemannOSC Aug 28, 2024

Choose a reason for hiding this comment

scanzy commented Sep 16, 2024 • edited Loading

andrewgsavage commented Aug 4, 2024 •

edited

Loading

MichaelTiemannOSC Aug 14, 2024 •

edited

Loading

scanzy commented Sep 16, 2024 •

edited

Loading