Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make AffineScalarFunc hashable for Pandas, Pint and Pint-Pandas #170

Closed

Conversation

MichaelTiemannOSC
Copy link

Add hash to AffineScalarFunc so it works with Pandas, Pint, and Pint-Pandas.

Part of pandas-dev/pandas#53970
and hgrecco/pint#1615
and hgrecco/pint-pandas#140

@jbrockmendel @hgrecco @andrewgsavage

Add __hash__ so it works with Pandas, Pint, and Pint-Pandas

Signed-off-by: Michael Tiemann <[email protected]>
@@ -1849,6 +1849,10 @@ def std_dev(self):
# Abbreviation (for formulas, etc.):
s = std_dev

def __hash__(self):
# Placeholder until we figure out how to really make these hashable
return id(self)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this satisfy the invariance x == y \Rightarrow hash(x) == hash(y)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran some tests:

u = ufloat(1, 2)
v = ufloat(1, 2)
w = u+u
z = w / 2

# u != v, expected
# u == z, expected
# id(u) != id(v) != id(z)

a = float(1+2)
b = float(2+1)
c = a+a
d = c/2

# a == b, expected
# a == d, expected
# b == d, expected
# id(a) != id(b) != id(d)

I'll see what I can come up with to satisfy the invariant. Thanks!

Tweak hash function so that when x==y, hash(x)==hash(y)

Test case:

```
import uncertainties
from uncertainties import ufloat

u = ufloat(1.23, 2.34)
v = ufloat(1.23, 2.34)
print(f"u{u} == v{v}: {u==v}")
print(f"hash(u){hash(u)} == hash(v){hash(v)}: {hash(u)==hash(v)}")
print(f"u{u} == (u+u)/2{(u+u)/2}: {u==(u+u)/2}")
print(f"hash(u){hash(u)} == hash((u+u)/2){hash((u+u)/2)}: {hash(u)==hash((u+u)/2)}")
```

Signed-off-by: Michael Tiemann <[email protected]>
@MichaelTiemannOSC
Copy link
Author

While I solved the invariant problem, when I ran pytest the pickle test failed. All others passed. I'll see what I can do about that...

Ensure that we have a linear_combo attribute before trying to look up its keys.  Also re-indent so our conditional test is not too long, and add test case to test_uncertainties.

Now test_uncertainties passes all 31 tests original tests plus the new one, and test_umath passes all 9 tests.

Signed-off-by: Michael Tiemann <[email protected]>
@@ -2404,3 +2404,21 @@ def test_correlated_values_correlation_mat():
assert arrays_close(
numpy.array(cov_mat),
numpy.array(uncert_core.covariance_matrix([x2, y2, z2])))

def test_hash():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any NaN-likes that will have x!=x? if you want those to be considered "matching" for e.g. dict lookups (and pandas Index lookups) those will need matching hashes

Copy link
Author

@MichaelTiemannOSC MichaelTiemannOSC Jul 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that there is no canonical NaN in the uncertainties package:

import numpy as np
import uncertainties
from uncertainties import ufloat

xx = ufloat(np.nan, 0)
yy = ufloat(np.nan, 1)

# nan+/-0
print(xx + xx)
# nan+/-2
print(yy + yy)

# False, because nan != nan
print(xx == xx)

The uncertainties package has unumpy.isnan which works for both scalar and arrays. There's no dictionary compare against a nan. The PintArray type uses unp.isnan to check whether something is a nan or not. The pandas changes all use PintArray.isna to test NA-ness (including NaN-ness). But if I'm missing something, please let me know!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now changed the Pint-Pandas changes to use np.nan instead of pd.NA. While I was able to get everything to work with pd.NA, things were cleaner using np.nan as the NA value for uncertain values.

Call `expand` directly if self._linear_part is not yet expanded.

Signed-off-by: Michael Tiemann <[email protected]>
Attempt to work around old version of nose that doesn't work with later versions of Python.  Will it work?

Signed-off-by: Michael Tiemann <[email protected]>
Attempt replacing `nose` (which has not been regularly supported for years) with `pytest` for better CI/CD experience.

Signed-off-by: Michael Tiemann <[email protected]>
Try running nosetests directly, rather than via setuptools, which trips up nose-1.3.7 for python >= 3.6.

See nose-devs/nose#873

Signed-off-by: Michael Tiemann <[email protected]>
Install `future` to satisfy `builtins` for Python 2.7.

Signed-off-by: Michael Tiemann <[email protected]>
@MichaelTiemannOSC
Copy link
Author

All changes needed for Pandas 2.1 are now present and Pint/Pint-Pandas changes are ready for review. Could we get a fresh release with hashable AffineScalarFunc?

@NelDav
Copy link

NelDav commented Jan 23, 2024

I made similar changes in #189, just because I haven't seeen this MR before. I do not have any clue about uncertainties, but wouldn't it be necessary, to check the correlations as well? You may check my MR, I tried to do it there.

@MichaelTiemannOSC
Copy link
Author

I made similar changes in #189, just because I haven't seeen this MR before. I do not have any clue about uncertainties, but wouldn't it be necessary, to check the correlations as well? You may check my MR, I tried to do it there.

@NelDav nice to meet you! The correlations are checked by expanding out the _linear_part. I created some test cases that demonstrate what I understand to be proper equality/inequality relationships. If your test case passes those, then great, because your hash function is simpler than mine. But you need to test all the corner cases you can think of as part of your PR. A nifty hash function alone is not enough.

@NelDav
Copy link

NelDav commented Jan 25, 2024

Hi @MichaelTiemannOSC, thank you for your answer. I merged your branch into mine an checked my hash function with your tests. Indeed, a few problems occoured. I then modified my hash calculation and now your tests are pasing. I think the new version is still simpler than your solution. However, I am far from an expert on how to calculate with uncertainties and how identity verification should be carried out. It would be very nice if you could look at my new proposal and share your impressions.

@jagerber48
Copy link
Contributor

See my comment on the other PR: #219 (comment)

Closing this PR in favor of #219. Though that one will likely be closed as well since I think we discovered implementing a hash requires a bit of a larger refactor of the AffineScalarFunc and LinearCombination code than simply slapping a __hash__ function somewhere.

@jagerber48 jagerber48 closed this Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants