-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: can np.nan stand in for nan+/-0? #169
Comments
Thanks for the interesting details. Now, can you show an example of what you want to do? I'm not fully seeing the problem yet (in part because uncertainties automatically promotes NaN to NaN±0). |
Here is some sample code. It does not play well with the current version of pint-pandas 0.3, but I have a pull-request that does make it work (hgrecco/pint-pandas#140). My latest iteration doesn't show the problems I think will exist due to having multiple
|
So...writing up my findings for the day: when PintPandas makes the ndarray for holding values for PintArrays, it's really best to allocate for ufloats if there are any ufloats to be seen, or if there are NaNs, which are the initial values in an "empty" array. If we try too soon to allocate our ndarray for float64 only, especially when all we see are NaNs, those arrays cannot later hold ufloats. The cost of allocating object arrays is well known (performance), but I'm seeing general happiness whereby dataframes filled with PintArrays do what is expected. What's cool (?) is that one cannot tell off-hand whether a PintArray with dtype='pint[kg]' is a float64-based or uncertainties-based array. It. Just. Works. There are still lots of edge cases to work out, but at the end of this day, I have something that's largely behaving and it's not throwing 10,000+ warning messages about "units stripped" or "casting to float" or whatever. So that's progress. |
I'm trying to use uncertainties with Pandas, Pint and Pint-Pandas. Pint-Pandas makes it easy to have quantified values on a column basis that really don't interact much (or at least badly) with other columns.
uncertainties relies of wrappers to do its things, whereas Pint and Pint-Pandas are now very complete in using ExtensionArrays to interact with Pandas. ExtensionArrays define a value for their na_type, which for most things numeric means np.nan.
In my past dealings with uncertainties, the nan for that has been nan+/-0, which has been fine, except that it now makes for difficult promotion rules. If I have an extension array of quantities (tons of CO2, millions of USD, whatever) with normal float64 magnitudes, the correct na_value for that is np.nan. But if I fill the array with uncertainties as magnitudes, the logical na_value would be nan+/-0. But there's no concept of multiple na_value depending on whether there are uncertainties in the mix.
One solution is to just bite the bullet and say "if you use uncertainties anywhere, then every dataframe needs to honor them, meaning that the na_value for ANYTHING is nan+/-0 (and all magnitudes must promote to UFloat)." What I'd like to do is to manage that column-by-column.
Is there a world in which np.nan is a fully adequate value for uncertainties, with whatever promotions/substitutions, etc happening within the wrappers? Or do I need to majorly rethink my approach of layering these various abstractions (uncertainties, quantities, DataFrames) together?
The text was updated successfully, but these errors were encountered: