Add support for UFloat in PintArray (#139) #140

MichaelTiemannOSC · 2022-10-15T19:28:23Z

Signed-off-by: MichaelTiemann [email protected]

Closes Pint-Pandas support for uncertain Quantities #139
Executed pre-commit run --all-files with no errors
The change is fully covered by automated unit tests
Documented in docs/ as appropriate
Added an entry to the CHANGES file

Signed-off-by: MichaelTiemann <[email protected]>

MichaelTiemannOSC · 2022-10-15T19:29:43Z

As noted, this change request may precipitate changes needed in how Pint deals with uncertainties (whose representation is not supported by vanilla Python parsers).

andrewgsavage · 2022-10-16T15:30:57Z

This lets you store ufloat objects, but I'm not sure how well pandas will for operations like reshaping, indexing etc. I suspect some will work but some won't.

I think you can add ufloats to fixtures in test_pandas_extensiontests and it'll run with standard and ufloat quantities. This should show what works and doesn't.

I think you should make it so that it only stores floats or ufloats in a PintArray. There's a few points where arrays are expected to contain the same object in them, and I think pandas expects each extensiontype to only store one type of object.

I also think (for the time being) you should make it only allow floats or ufloats globally, as there's no way to distinguish between a pint[m] (float) or pint[m] (ufloat) type. This is a wider issue that also applies to ints or other numeric types.

MichaelTiemannOSC · 2022-10-16T18:06:40Z

I'm relatively new to all the pytest stuff in pint/pandas, etc. Am I understanding correctly that I should modify

@pytest.fixture
def data():
    return PintArray.from_1darray_quantity(np.arange(start=1.0, stop=101.0) * ureg.nm)


@pytest.fixture
def data_missing():
    return PintArray.from_1darray_quantity([np.nan, 1] * ureg.meter)


@pytest.fixture
def data_for_twos():
    x = [
        2.0,
    ] * 100
    return PintArray.from_1darray_quantity(x * ureg.meter)

to ufloat-based values? I'm going to give that a try, but if there's a better way to do it (or a you meant something different) please guide me. Thanks!

MichaelTiemannOSC · 2022-10-16T18:54:16Z

Progress...I've now got 203 failures, the vast majority of which are due to the fact that ufloats that are computed to be the same value, but not the same variable, do not compare as equal, so all the assert_series_equal, assert_extension_array_equal, assert_frame_equal fail, through no fault of their own. It will be quite some work to plug in an appropriate "assert_*_nominally_equal` everywhere.

I do not know how to make progress on this:

__________________________________________________________________________________________________________________ ERROR at setup of TestMethods.test_insert_invalid __________________________________________________________________________________________________________________
file /Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/methods.py, line 558
      def test_insert_invalid(self, data, invalid_scalar):
E       fixture 'invalid_scalar' not found
>       available fixtures: all_arithmetic_operators, all_boolean_reductions, all_compare_operators, all_data, all_numeric_reductions, as_array, as_frame, as_series, box_in_series, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, data, data_for_grouping, data_for_sorting\
, data_for_twos, data_missing, data_missing_for_sorting, data_repeated, doctest_namespace, dtype, fillna_method, groupby_apply_op, monkeypatch, na_cmp, na_value, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, sort_by_key, tmp_path, tmp_p\
ath_factory, tmpdir, tmpdir_factory, use_numpy
>       use 'pytest --fixtures [testpath]' for help on them.

/Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/methods.py:558
_________________________________________________________________________________________________________________ ERROR at setup of TestSetitem.test_setitem_invalid __________________________________________________________________________________________________________________
file /Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/setitem.py, line 437
      def test_setitem_invalid(self, data, invalid_scalar):
E       fixture 'invalid_scalar' not found
>       available fixtures: all_arithmetic_operators, all_boolean_reductions, all_compare_operators, all_data, all_numeric_reductions, as_array, as_frame, as_series, box_in_series, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, data, data_for_grouping, data_for_sorting\
, data_for_twos, data_missing, data_missing_for_sorting, data_repeated, doctest_namespace, dtype, fillna_method, full_indexer, groupby_apply_op, monkeypatch, na_cmp, na_value, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, sort_by_key, t\
mp_path, tmp_path_factory, tmpdir, tmpdir_factory, use_numpy
>       use 'pytest --fixtures [testpath]' for help on them.

/Users/michael/opt/miniconda3/envs/pandas/lib/python3.9/site-packages/pandas/tests/extension/base/setitem.py:437

Also: 191 passed, 1 skipped, 69 xfailed, 5 xpassed

andrewgsavage · 2022-10-16T21:03:00Z

I'm relatively new to all the pytest stuff in pint/pandas, etc. Am I understanding correctly that I should modify

yes like that, there are some fixtures that return a value from a list, like all_arithmetic_operators. I'd try changing these so they test both ufloat and float.

andrewgsavage · 2022-10-16T21:12:42Z

Use this fork of pandas for the assert...equal tets to work

pint-pandas/.github/workflows/ci.yml

Line 11 in 236cf0f

pandas: ["git+https://github.com/andrewgsavage/pandas.git@assert_almost_equal"]

edit: I'm not 100% sure this will work, it makes the tests work for quantity but you might have a different issue there

andrewgsavage · 2022-10-16T21:26:34Z

fixture 'invalid_scalar' not found

that's coming from a newer pandas with extra tests compared pandas 1.4x, you can rebase from my #133 if you're using a higher version... but for now I wouldn't worry about those errors

MichaelTiemannOSC · 2022-10-17T02:48:09Z

More progress...down to essentially one kind of failure--the un-hashabiliity of UFloat. I don't have the Python sophistication to know what the next step would be, but I'll push the changes I have thus far.

=== short test summary info ================================================================================
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_agg[True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_agg[False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_agg_extension - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="B") are different
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_no_sort - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_transform - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[scalar] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[list] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[series] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_groupby_extension_apply[object] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestGroupby::test_in_numeric_groupby - AssertionError: Did not see expected warning of class 'FutureWarning'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestInterface::test_contains - AssertionError
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts_with_normalize - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_sort_values_frame[True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_sort_values_frame[False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize[-1] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize[-2] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize_equivalence[-1] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_factorize_equivalence[-2] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_hash_pandas_object_works[True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_hash_pandas_object_works[False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data-True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data-False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data_missing-True] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_value_counts[data_missing-False] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[<lambda>-Series] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[<lambda>-<lambda>] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[unique-Series] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_unique[unique-<lambda>] - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestArithmeticOps::test_divmod_series_array - TypeError: ufunc 'divmod' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''...
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestArithmeticOps::test_divmod - TypeError: ufunc 'divmod' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestReshaping::test_merge_on_extension_array - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestReshaping::test_merge_on_extension_array_duplicates - TypeError: unhashable type: 'AffineScalarFunc'
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_setitem_scalar_key_sequence_raise - Failed: DID NOT RAISE <class 'ValueError'>
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_setitem_frame_2d_values - AssertionError: Caused unexpected warning(s): [('UnitStrippedWarning', UnitStrippedWarning('The unit of the quantity is stripped when downcasting to ndarray.'), ...
FAILED pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_delitem_series - TypeError: unhashable type: 'AffineScalarFunc'
ERROR pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestMethods::test_insert_invalid
ERROR pint-pandas/pint_pandas/testsuite/test_pandas_extensiontests.py::TestSetitem::test_setitem_invalid
==== 35 failed, 331 passed, 48 xfailed, 26 xpassed, 268 warnings, 2 errors in 15.68s =======================

MichaelTiemannOSC · 2022-10-17T18:36:44Z

Does it make sense to rebase PintArrays on uarrays (which can be tuned to use numpy.ndarray when magnitudes are np.floating and use unumpy.uarray when they are UFloat? This notebook shows how easily things can be changed on the fly: https://github.com/Quansight-Labs/unumpy/blob/master/notebooks/01_user_facing.ipynb

MichaelTiemannOSC · 2022-10-17T19:01:51Z

I think you should make it so that it only stores floats or ufloats in a PintArray. There's a few points where arrays are expected to contain the same object in them, and I think pandas expects each extensiontype to only store one type of object.

I'll give that a try. While I'm not sure it's toally necessary, I think it's a good exercise.

I also think (for the time being) you should make it only allow floats or ufloats globally, as there's no way to distinguish between a pint[m] (float) or pint[m] (ufloat) type. This is a wider issue that also applies to ints or other numeric types.

It will certainly keep the discussion with Pandas on point to present uniform ExtensionTypes and to keep both Pint and Pandas on the straight-and-narrow there.

andrewgsavage · 2022-10-17T20:56:10Z

Does it make sense to rebase PintArrays on uarrays

you can try, it's easy to check. I don't know what effects that'd have. Numpy is used accross pandas so would still be used.

Big remaining problem is unhanshable UFloat type. Signed-off-by: MichaelTiemann <[email protected]>

Signed-off-by: [email protected]

MichaelTiemannOSC · 2022-10-21T11:00:23Z

pint_pandas/testsuite/test_pandas_extensiontests.py

@@ -24,6 +33,120 @@

 ureg = PintType.ureg

+import pandas._testing as tm


A note about these changes: when comparing UFloats, if the .s (std_dev) terms zero, then two numbers with the same .n (nominal_value) terms will compare equal. However, if there are uncertainties in the numbers, even identical uncertainties, they will not compare equal unless they are identically the same uncertain variable. Since the test suite typically finds two ways to do the same thing and then compare the results, this leads to UFloats comparing not equal unless the uncertainties are zero.

I started making chnages to try to compare UFloats by comparing their .n and .s separately (not using ==), but this is a real hassle, as there are many, many cases to test and unpack. For my own sanity, I simplified the task by using only UFloats with uncertainty values of zero, allowing me to test float vs. UFloat handling without the spurious problems with ufloat(a) =?= ufloat(b).

I've left the code I started writing for the more general relaxation of ufloat comparison, but it shouldn't actually be needed if we keep our unfloat uncertainty at zero (which is presently the case). So we can either remove, improve, or defer dealing with uassert_equal and friends. LMK

Signed-off-by: MichaelTiemann <[email protected]>

Big remaining problem is unhanshable UFloat type. Signed-off-by: MichaelTiemann <[email protected]>

Signed-off-by: [email protected]

Signed-off-by: MichaelTiemann <[email protected]>

…C/pint-pandas into ducks-unlimited Signed-off-by: Michael Tiemann <[email protected]>

MichaelTiemannOSC · 2022-11-03T16:24:41Z

I thought it would be helpful to me to merge in changes that have been merged upstream, not realizing how that would affect the size of the PR. If I should revert to the previous version so that the PR has only my changes, I'll do that. I'm still learning my way around git, and everything new I do is truly a "learning experience".

Deal with duality of np.nan and _ufloat_nans when constructing PintArrays, creating float64 magnitude arrays when possible and ufloat magnitudes when necessary. Note that PintArray columns don't advertise whether magnitudes are np.float64 or np.obejct; they are what they are. Also commented out warning, which simply creates noise. Signed-off-by: Michael Tiemann <[email protected]>

Fix conditional expression. Signed-off-by: Michael Tiemann <[email protected]>

Don't try to build a large-enough-to-hold-uncertainties array if we are not using uncertainties in any way. Signed-off-by: Michael Tiemann <[email protected]>

Allow pip install command to unpack uncertainties option into multiple arguments. Signed-off-by: Michael Tiemann <[email protected]>

MichaelTiemannOSC · 2023-09-18T01:39:35Z

Thanks @hgrecco for creating Pint-0.23rc0. I've attempted to write some rules to optionally test uncertainties, but I don't know the right way to leave the normal Pint-Pandas matrix to do its thing, and then create an uncertainties test that uses Pandas-2.1.0 and Pint-0.23rc0. The approach I took simply overrode Pandas and Pint every time. Maybe I need to add a null string "" and then test for the null string? My yaml/shell scripting is fairly primitive.

In any case, if I can get the CI/CD working as it should, then these changes should be ready to go for a pint_pandas-0.6rc0.

By using `include` directive, we can add a special case to the matrix of possibilities. In this case we add to the matrix the case where pandas>=2.1.0 and pint>=0.23rc0, which are both necessary for the UFloat array changes to work. Signed-off-by: Michael Tiemann <[email protected]>

Don't test uncertainties in ci-pint-pre or ci-pint-master, as these don't have the necessary versions of Pandas or Pint to make the matrix include operation work. Signed-off-by: Michael Tiemann <[email protected]>

Also test Pint-Pandas without uncertainties. Signed-off-by: Michael Tiemann <[email protected]>

Trying again with different syntax to avoid duplicate key issue. Signed-off-by: Michael Tiemann <[email protected]>

Trying yet another syntax to make `uncertainties` one of two options when pandas==2.1.0 and pint==0.23rc0. Signed-off-by: Michael Tiemann <[email protected]>

Try using null instead of "" for uncertainties that we don't want to run. Signed-off-by: Michael Tiemann <[email protected]>

Try using "null" to represent a condition where we don't want uncertainties installed. Signed-off-by: Michael Tiemann <[email protected]>

Try using "no-thank-you" instead of "null" for matrix permutations not intended to use uncertainties. Signed-off-by: Michael Tiemann <[email protected]>

Wrap `uncertainties` matrix parameter in `[]`. If we get this working, can try setting parameter to `''`' from several commits ago. Signed-off-by: Michael Tiemann <[email protected]>

Use single quotes when testing the value of `uncertainties`. Signed-off-by: Michael Tiemann <[email protected]>

Tweak conditional matrix logic by using a "same-but-different" value for `pint` value assignment. Signed-off-by: Michael Tiemann <[email protected]>

Untabify yaml file... Signed-off-by: Michael Tiemann <[email protected]>

Make uncertainties a default null value in the matrix. Signed-off-by: Michael Tiemann <[email protected]>

Attempt to explicitly add python-version and numpy to `uncertainties` case. Otherwise, pytest never gets installed. Signed-off-by: Michael Tiemann <[email protected]>

We cannot reference the matrix when building the matrix, so unroll all parameter values. Signed-off-by: Michael Tiemann <[email protected]>

Remove unused `uncertainties` from -pre and -master CI files. Only test one configuration of Python against uncertainties (trying to work around duplicate key problem). Signed-off-by: Michael Tiemann <[email protected]>

MichaelTiemannOSC · 2023-09-19T11:16:13Z

We now have CI/CD testing without uncertainties across a matrix, and additionally testing uncertainties in one specific matrix case (Python 3.9, Pandas 2.1.0, Pint 0.23rc0). That should make us good to go for an 0.6rc0, no?

MichaelTiemannOSC · 2023-09-25T16:38:01Z

Nudge @andrewgsavage

andrewgsavage · 2023-11-12T16:59:00Z

Sorry whenever I look at this I see that uses a numpy object type array, which I didn't expect to work... and get hung up thinking about well it works.

There aren't any examples using an array of objects, so although it works, it's not really supported. Merging this implies support, which I'm not really sure is a good idea. The code wasnt written with that in mind, and other than your issue test there's no testing for this a mix of different objects in numpy array.

Does it make sense to rebase PintArrays on uarrays (which can be tuned to use numpy.ndarray when magnitudes are np.floating and use unumpy.uarray when they are UFloat? This notebook shows how easily things can be changed on the fly: https://github.com/Quansight-Labs/unumpy/blob/master/notebooks/01_user_facing.ipynb

reading this through again, that's kind of what I meant by making a seperate ExtenisonArray for uncertainties, say UncertaintyArray. The PintArray can then use the UncertaintyArray for the data, as it does with other pandas EAs. The extra logic needed to handle uncertainties you've written in this PR can be moved to the UA and the PintArray can deal with the units and delegate the rest to the UA magnitude (like how pint's Quantity does so)

I'm not sure how well a UA would work with pint-pandas (it may still need some code to handle uncertainties) but would like to see how well that works before deciding on this.

MichaelTiemannOSC · 2023-11-12T19:53:36Z

Sorry whenever I look at this I see that uses a numpy object type array, which I didn't expect to work... and get hung up thinking about well it works.

Me too 😆

There aren't any examples using an array of objects, so although it works, it's not really supported. Merging this implies support, which I'm not really sure is a good idea. The code wasnt written with that in mind, and other than your issue test there's no testing for this a mix of different objects in numpy array.

Having spent a fair bit of time inside the Pandas code, especially transitioning from 1.5.x to 2.x, they really have nicely generalized support for object types in NumPy arrays. I started off needing a few changes to Pandas to support this implementation initially, but over time every single change I originally needed was rendered unnecessary. I did manage to get some changes merged upstream (and have another one in the works, see pandas-dev/pandas#55825). That latter one is not because numpy doesn't tolerate object types, but because Pint Quantities have their own opinions about being put into arrays.

Does it make sense to rebase PintArrays on uarrays (which can be tuned to use numpy.ndarray when magnitudes are np.floating and use unumpy.uarray when they are UFloat? This notebook shows how easily things can be changed on the fly: https://github.com/Quansight-Labs/unumpy/blob/master/notebooks/01_user_facing.ipynb

reading this through again, that's kind of what I meant by making a seperate ExtenisonArray for uncertainties, say UncertaintyArray. The PintArray can then use the UncertaintyArray for the data, as it does with other pandas EAs. The extra logic needed to handle uncertainties you've written in this PR can be moved to the UA and the PintArray can deal with the units and delegate the rest to the UA magnitude (like how pint's Quantity does so)

My initial thoughts are informed by the uncertainties implementation (which of course does use uarrays). Because of the way uncertainties wrap and unwrap everything, it's super-inconvenient to treat them as a first-class type that can live in an ExtensionArray. Maybe I didn't approach that correctly the first time, but I did try it and I couldn't make it work. Which is why I went with the flow and decided to let it do its magic on the magnitudes.

One of the very last things I needed to do was to ensure that the block managers would not mess up allocation of arrays that initially contained only floats, but which would later need to contain an uncertainty. I realized this was the same problem if one started with an array of float64 and then something later changed one entry to a complex128. I went big and added comprehensive complex testing to Pandas (pandas-dev/pandas#54761) which did indeed expose the same problem. So I proposed using the uncertainties changes to solve the complex128 problems. The Pandas folks fixed that by always allocating a fresh array if something didn't fit, and those changes were no longer neccessary.

I'm not sure how well a UA would work with pint-pandas (it may still need some code to handle uncertainties) but would like to see how well that works before deciding on this.

Happy to continue the discussion, and happy to prototype. Just wanted to share my experiences so far.

andrewgsavage · 2023-11-13T14:48:55Z

there is this
https://github.com/varchasgopalaswamy/AutoUncertainties/blob/main/auto_uncertainties/pandas_compat.py

does it do what you need on the uncertanty progopation side?

MichaelTiemannOSC · 2023-11-13T15:57:20Z

I reviewed that approach in this comment: hgrecco/pint#1797 (comment)

Which is why I re-upped my suggestion to merge the changes I developed instead. If I missed something, I would be happy to look again.

andrewgsavage · 2023-11-14T17:11:55Z

I made a start at this, enough to show what I'm thinking with code:

arr = np.array([ufloat(1, 0.01), ufloat(2, 0.1)])
arr
array([1.0+/-0.01, 2.0+/-0.1], dtype=object)

ua = UncertaintyArray(arr,np.array([False,False]))
ua
<UncertaintyArray>
[1.000+/-0.010, 2.00+/-0.10]
Length: 2, dtype: <class 'pint_pandas.uncertainty_array.UncertaintyType'>

pa = PintArray(ua,dtype="m")
pa
<PintArray>
[1.000+/-0.010, 2.00+/-0.10]
Length: 2, dtype: pint[meter]

pa.data
<UncertaintyArray>
[1.000+/-0.010, 2.00+/-0.10]
Length: 2, dtype: <class 'pint_pandas.uncertainty_array.UncertaintyType'>

Main benefits to this approach are:

an EA for uncertainties that can be used without units/pint, so can be used by more people
Uncertainty specific code lives in the UncertaintyArray (I hope!)

My initial thoughts are informed by the uncertainties implementation (which of course does use uarrays). Because of the way uncertainties wrap and unwrap everything, it's super-inconvenient to treat them as a first-class type that can live in an ExtensionArray. Maybe I didn't approach that correctly the first time, but I did try it and I couldn't make it work. Which is why I went with the flow and decided to let it do its magic on the magnitudes.

Do you have an example of this? I'm not familiar with uncertainties.

I started off by inherting from BaseMaskedArray, which uses a mask to know where the nans are. This may help with the nans issue, or it may not be needed. It's also not public... If avoiding the nans issue isn't needed it may be better to reserrect pandas-dev/pandas#45544

MichaelTiemannOSC · 2023-11-14T20:03:42Z

Who will be maintaining UncertaintyArray? Is this something we need to get the Pandas folks to adopt? Numpy? Pint depends on the uncertainties package (for Measurement). Getting Pandas to support uncertainties as a dtype would be quite a lift (see pandas-dev/pandas#49181 (review)). It could be that some day in the future, UncertaintyArray could be merged from Pint/Pint-Pandas into Pandas, but I think we have to assume that the work needs to start within our sphere, not theirs.

As for the wrapping/unwrapping, a better way to say this is that uncertainties does is magic not by how it specializes arrays but by wrapping all the NumPy ufuncs that operate on uncertainty values (though as NumPy and uncertainties have diverged there are unintentional gaps (see lmfit/uncertainties#163 (comment)).

unumpy arrays are not derived from numpy arrays, they wrap/reimplement everything. Which is why we have the ugliness of unp.isna(uarray) vs. pd.isna(PandasArray). It seems like UncertaintyArray could implement the necessities of uarrays while providing a pandas-friendly interface so we can use isna naturally. Which would be great.

Let me know if you'd like to share a branch and collaborate. My code certainly exercises uncertainties, Pint, and Pint-Pandas!

andrewgsavage · 2023-11-14T20:47:32Z

Who will be maintaining UncertaintyArray?

It should either be in uncertainties or in another module, uncertainties-pandas.

Is this something we need to get the Pandas folks to adopt? Numpy?

Shouldn't, but may need changes like you found for pandas.

Pint depends on the uncertainties package (for Measurement).

Can't remember exactly what the plans were, but I remember reading something, possibly to move that to another module... but noones got around to it in forever.

Getting Pandas to support uncertainties as a dtype would be quite a lift (see pandas-dev/pandas#49181 (review)). It could be that some day in the future, UncertaintyArray could be merged from Pint/Pint-Pandas into Pandas, but I think we have to assume that the work needs to start within our sphere, not theirs.

Yea I don't see that happening in a long time either. There's a few pandas related issues under uncertainties so I think that's where to make a PR. It may be better in another package to prevent a dependency on pandas.

As for the wrapping/unwrapping, a better way to say this is that uncertainties does is magic not by how it specializes arrays but by wrapping all the NumPy ufuncs that operate on uncertainty values (though as NumPy and uncertainties have diverged there are unintentional gaps (see lebigot/uncertainties#163 (comment)).

unumpy arrays are not derived from numpy arrays, they wrap/reimplement everything. Which is why we have the ugliness of unp.isna(uarray) vs. pd.isna(PandasArray). It seems like UncertaintyArray could implement the necessities of uarrays while providing a pandas-friendly interface so we can use isna naturally. Which would be great.

Let me know if you'd like to share a branch and collaborate. My code certainly exercises uncertainties, Pint, and Pint-Pandas!

Numpy functions are a long way off! but I suspect you can do similar to how pint does with __array_function__ to get desired results

MichaelTiemannOSC · 2023-11-14T21:28:37Z

xref: pandas-dev/pandas#14162, pandas-dev/pandas#20993, and pandas-dev/pandas#22290. The first two especially give strong encouragement to bring uncertainties into Pandas. I think that many of the 2.x changes make this much easier, though no clue yet on how to deal with the uncertainties package in CI/CD. I think a strong proof of concept in Pint-Pandas could break through into Pandas (which would be a win for all).

andrewgsavage · 2023-11-14T22:02:22Z

None of those issues imply the UncertaintyArray should live in pandas! Just that one should be made. Indeed in the first linked issue

better yet would be to write and ExtensionArray to have first class support in pandas.

I'd be very suprised if they'd merge it into pandas, hence my uncertainties or uncertainties-pandas suggestion.

Add support for UFloat in PintArray (hgrecco#139)

b5954fd

Signed-off-by: MichaelTiemann <[email protected]>

MichaelTiemannOSC mentioned this pull request Oct 16, 2022

Parse uncertainties hgrecco/pint#1615

Merged

5 tasks

MichaelTiemannOSC mentioned this pull request Oct 17, 2022

Unexpected results for the mean of a DataFrame of ufloat from the uncertainties package. pandas-dev/pandas#14162

Closed

MichaelTiemannOSC added 2 commits October 19, 2022 08:04

Fix failures and errors found by test_pandas_extensions test suite.

0ad1cf9

Big remaining problem is unhanshable UFloat type. Signed-off-by: MichaelTiemann <[email protected]>

Preserve incoming np.array when promoting float to ufloat in PintArray

52ab185

Signed-off-by: [email protected]

MichaelTiemannOSC mentioned this pull request Oct 21, 2022

Adopt NEP-37 and NEP-47 (array module and Python Array API) hgrecco/pint#1592

Open

MichaelTiemannOSC commented Oct 21, 2022

View reviewed changes

MichaelTiemannOSC added 6 commits October 24, 2022 22:50

Fix logic to detect heterogeneous arrays of Ufloats and floats.

f3cdcad

Signed-off-by: MichaelTiemann <[email protected]>

Add support for UFloat in PintArray (hgrecco#139)

3ffb617

Signed-off-by: MichaelTiemann <[email protected]>

Fix failures and errors found by test_pandas_extensions test suite.

2f89897

Big remaining problem is unhanshable UFloat type. Signed-off-by: MichaelTiemann <[email protected]>

Preserve incoming np.array when promoting float to ufloat in PintArray

dce2668

Signed-off-by: [email protected]

Fix logic to detect heterogeneous arrays of Ufloats and floats.

d4ca9f0

Signed-off-by: MichaelTiemann <[email protected]>

Merge branch 'ducks-unlimited' of https://github.com/MichaelTiemannOS…

c375aeb

…C/pint-pandas into ducks-unlimited Signed-off-by: Michael Tiemann <[email protected]>

MichaelTiemannOSC mentioned this pull request Jan 2, 2023

Question: can np.nan stand in for nan+/-0? lmfit/uncertainties#169

Closed

MichaelTiemannOSC added 3 commits January 3, 2023 11:44

Update pint_array.py

8b06708

Fix conditional expression. Signed-off-by: Michael Tiemann <[email protected]>

Update pint_array.py

c5b7926

Don't try to build a large-enough-to-hold-uncertainties array if we are not using uncertainties in any way. Signed-off-by: Michael Tiemann <[email protected]>

MichaelTiemannOSC mentioned this pull request Jun 25, 2023

Compatibility with 'uncertainties'? #124

Open

2nd attempt integrating uncertainties and CI/CD

108cb71

Allow pip install command to unpack uncertainties option into multiple arguments. Signed-off-by: Michael Tiemann <[email protected]>

MichaelTiemannOSC added 16 commits September 19, 2023 05:02

Only test uncertainties in ci.yml

82442ab

Don't test uncertainties in ci-pint-pre or ci-pint-master, as these don't have the necessary versions of Pandas or Pint to make the matrix include operation work. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

5d351fc

Also test Pint-Pandas without uncertainties. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

c4e3a06

Trying again with different syntax to avoid duplicate key issue. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

8d5feb9

Trying yet another syntax to make `uncertainties` one of two options when pandas==2.1.0 and pint==0.23rc0. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

bfe9a77

Try using null instead of "" for uncertainties that we don't want to run. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

25adf35

Try using "null" to represent a condition where we don't want uncertainties installed. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

afc3eb4

Try using "no-thank-you" instead of "null" for matrix permutations not intended to use uncertainties. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

e281dfc

Wrap `uncertainties` matrix parameter in `[]`. If we get this working, can try setting parameter to `''`' from several commits ago. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

a208163

Use single quotes when testing the value of `uncertainties`. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

296bbdc

Tweak conditional matrix logic by using a "same-but-different" value for `pint` value assignment. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

585f38d

Untabify yaml file... Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

75d4c56

Make uncertainties a default null value in the matrix. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

6988212

Attempt to explicitly add python-version and numpy to `uncertainties` case. Otherwise, pytest never gets installed. Signed-off-by: Michael Tiemann <[email protected]>

Update ci.yml

764b609

We cannot reference the matrix when building the matrix, so unroll all parameter values. Signed-off-by: Michael Tiemann <[email protected]>

Update ci-*.yml files

9ed23c1

Remove unused `uncertainties` from -pre and -master CI files. Only test one configuration of Python against uncertainties (trying to work around duplicate key problem). Signed-off-by: Michael Tiemann <[email protected]>

Merge branch 'master' into ducks-unlimited

b609001

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for UFloat in PintArray (#139) #140

Add support for UFloat in PintArray (#139) #140

MichaelTiemannOSC commented Oct 15, 2022 •

edited

Loading

MichaelTiemannOSC commented Oct 15, 2022

andrewgsavage commented Oct 16, 2022

MichaelTiemannOSC commented Oct 16, 2022

MichaelTiemannOSC commented Oct 16, 2022 •

edited

Loading

andrewgsavage commented Oct 16, 2022 •

edited

Loading

andrewgsavage commented Oct 16, 2022 •

edited

Loading

andrewgsavage commented Oct 16, 2022

MichaelTiemannOSC commented Oct 17, 2022 •

edited

Loading

MichaelTiemannOSC commented Oct 17, 2022

MichaelTiemannOSC commented Oct 17, 2022

andrewgsavage commented Oct 17, 2022

MichaelTiemannOSC Oct 21, 2022

MichaelTiemannOSC commented Nov 3, 2022

MichaelTiemannOSC commented Sep 18, 2023

MichaelTiemannOSC commented Sep 19, 2023

MichaelTiemannOSC commented Sep 25, 2023

andrewgsavage commented Nov 12, 2023

MichaelTiemannOSC commented Nov 12, 2023 •

edited

Loading

andrewgsavage commented Nov 13, 2023

MichaelTiemannOSC commented Nov 13, 2023

andrewgsavage commented Nov 14, 2023

MichaelTiemannOSC commented Nov 14, 2023

andrewgsavage commented Nov 14, 2023 •

edited

Loading

MichaelTiemannOSC commented Nov 14, 2023

andrewgsavage commented Nov 14, 2023

		@@ -24,6 +33,120 @@

		ureg = PintType.ureg

		import pandas._testing as tm

Add support for UFloat in PintArray (#139) #140

Are you sure you want to change the base?

Add support for UFloat in PintArray (#139) #140

Conversation

MichaelTiemannOSC commented Oct 15, 2022 • edited Loading

MichaelTiemannOSC commented Oct 15, 2022

andrewgsavage commented Oct 16, 2022

MichaelTiemannOSC commented Oct 16, 2022

MichaelTiemannOSC commented Oct 16, 2022 • edited Loading

andrewgsavage commented Oct 16, 2022 • edited Loading

andrewgsavage commented Oct 16, 2022 • edited Loading

andrewgsavage commented Oct 16, 2022

MichaelTiemannOSC commented Oct 17, 2022 • edited Loading

MichaelTiemannOSC commented Oct 17, 2022

MichaelTiemannOSC commented Oct 17, 2022

andrewgsavage commented Oct 17, 2022

MichaelTiemannOSC Oct 21, 2022

Choose a reason for hiding this comment

MichaelTiemannOSC commented Nov 3, 2022

MichaelTiemannOSC commented Sep 18, 2023

MichaelTiemannOSC commented Sep 19, 2023

MichaelTiemannOSC commented Sep 25, 2023

andrewgsavage commented Nov 12, 2023

MichaelTiemannOSC commented Nov 12, 2023 • edited Loading

andrewgsavage commented Nov 13, 2023

MichaelTiemannOSC commented Nov 13, 2023

andrewgsavage commented Nov 14, 2023

MichaelTiemannOSC commented Nov 14, 2023

andrewgsavage commented Nov 14, 2023 • edited Loading

MichaelTiemannOSC commented Nov 14, 2023

andrewgsavage commented Nov 14, 2023

MichaelTiemannOSC commented Oct 15, 2022 •

edited

Loading

MichaelTiemannOSC commented Oct 16, 2022 •

edited

Loading

andrewgsavage commented Oct 16, 2022 •

edited

Loading

andrewgsavage commented Oct 16, 2022 •

edited

Loading

MichaelTiemannOSC commented Oct 17, 2022 •

edited

Loading

MichaelTiemannOSC commented Nov 12, 2023 •

edited

Loading

andrewgsavage commented Nov 14, 2023 •

edited

Loading