-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subdtype #247
base: master
Are you sure you want to change the base?
subdtype #247
Conversation
This is ready for looking at now. |
if isinstance(values, _Quantity): | ||
values = values.to(dtype.units).magnitude | ||
elif isinstance(values, PintArray): | ||
values = values._data | ||
if isinstance(values, np.ndarray): | ||
dtype = values.dtype | ||
if dtype in ddtypemap: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned after the fact with respect to #137, this particular type of dtype handling caused me some grief. Now that I've sorted out what sort of grief it caused me, I'll be ready to review these changes more properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be possible to remove ddtypemap
with this change. the only other place it's used is in to_numpy
which could defer to the underlying ExtensionArray
There are two ways that these changes mess with my world. The first is when I craft an ndarray of things that contain np.nan values and they get converted to NA values (because Float64). The second is when I try to construct arrays with uncertainty values (uncertainty Variables cannot be converted to Float64). In both cases, I run into this when I use |
note you can change the default from pandas devs were surprised we are using float64 and not Float64- which is because it hasn't been changed since early on! I think those changes to construction are good as they'll make it more unamibgious what the underlying data is. The |
There's a few things on the testing side to change/look at:
|
I'm an age-old C programmer, and this is how I think. I but I know it's become more fashionable to use setters and getters to affect what might be a private variable. But yes, I can change the default to "float64" for my world, and then I can simplify my try statements to use the default and fallback to |
@@ -859,10 +890,17 @@ def _create_comparison_method(cls, op): | |||
return cls._create_method(op, coerce_to_dtype=False) | |||
|
|||
@classmethod | |||
def from_1darray_quantity(cls, quantity): | |||
def from_1darray_quantity(cls, quantity, subdtype=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of subdtype here is noted, but none of the callers of from_1darray_quantity
make use of it. When I have two PintArrays (of 'EUR') for example, and I construct them as pint[EUR][float64]
and then divide them to get a dimensionless ratio, the [float64]
nature goes away and I get back a pint[dimensionless][Float64]
, which I don't want. The question is: should we fix all the callers of this function to pass the subdtype of the quantity as the subdtype value, and should that override the call on line 900 to pass the incoming subdtype
of the pd.array
call? Or should we fix this function to preserve the subdtype of the quantity.magnitude
if the incoming subdtype
is None and pass that on line 900 rather than rederiving it on line 901?
The latest changes still cause problems. At line 875 (function _binop, with coerce_to_dtype being True), we call |
Ah I meant to go back and check that line. Does it behave correctly for you now? |
pint_pandas/pint_array.py
Outdated
@@ -872,7 +872,7 @@ def convert_values(param): | |||
|
|||
if coerce_to_dtype: | |||
try: | |||
res = cls.from_1darray_quantity(res) | |||
res = cls.from_1darray_quantity(res, self.dtype.subdtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing self.dtype.subdtype
here doesn't fix the problem that at line 901 pd.array
can convert a float64
array into a Float64
array and line 902 will then take the subdtype
from that, ignoring the subdtype passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you give a worked example? sounds like you've got a Quantity( NumpyExtensionArray([1.0, ...])
and I'm not sure how that's happened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a worked example:
import pandas as pd
import numpy as np
import pint_pandas
xx = pd.Series([1.0, 2.0, np.nan], dtype="pint[km][float64]")
yy = pd.Series([2.0, 3.0, 4.0], dtype="pint[km][float64]")
print(xx.fillna(123.456))
print(xx.fillna(xx + yy))
The conversion of xx+yy
from pint[km][float64]
to pint[km][Float64]
makes it unsuitable as a further source to a fillna
operation.
pint_pandas/pint_array.py
Outdated
@@ -864,15 +864,22 @@ def convert_values(param): | |||
# a TypeError should be raised | |||
res = op(lvalues, rvalues) | |||
|
|||
subdtype = self.dtype.subdtype | |||
if "truediv" in op.__name__ and subdtype.kind == "i": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case self can be:
<PintArray>
[1581.2004963851048, 2322.7768972177428, 2065.2759837278822,
2108.0321074762774, 1922.714515192993]
Length: 5, dtype: pint[USD * percent][float64]
self.dtype is pint[USD * percent][float64]
and self.dtype.subdtype is 'float64'
. But the string 'float64'
does not have a kind
attribute.
Hey thank you for all your work on #247 ! So |
float("nan")
not always converted topd.NA
inside series with pint dtype #238 Constructor path unexpectedly changes outcome #205 Eqaulity checks when magnitudes have different dtypes #218pre-commit run --all-files
with no errors