docs #191

andrewgsavage · 2023-06-27T21:40:47Z

Closes # (insert issue number)
Executed pre-commit run --all-files with no errors
The change is fully covered by automated unit tests
Documented in docs/ as appropriate
Added an entry to the CHANGES file

https://pint-pandas.readthedocs.io/en/docs/

…nto docs

MichaelTiemannOSC · 2023-06-30T16:00:26Z

Would it be useful to document the general implications and idioms of using ExtensionArrays? I'm gradually learning that when converting an ndarray into a PandasArray, np.nan becomes NA (and vice-versa in the other direction). Helping people understand how NA and np.nan play inside of Quantities, and the most efficient idioms for dealing with them correctly (pd.isna vs. np.isnan) could be very helpful. I could help write it if you tell me where you think it belongs.

andrewgsavage · 2023-07-01T09:42:14Z

I don't see many issues relating to nans so I'm wondering if you're encountering this because you're doing less typical workflows. It would be worth making an issue with your findings to understand where they're coming from.

I expect it to be sometihng to do with PintArrays either having PandasArrays or some form of np.array holding the values. I wonder if a better way for uncertainties would be to create an UncertaintyArray that the PintArray can use for values?

MichaelTiemannOSC · 2023-07-01T09:49:46Z

You are one step ahead of me. Last night I put my finger on what seems to be the last problem in my own test cases (the pint_pandas test cases don't trip it). When pd.merge needs to fill unmatched values with NaNs, it was creating invalid ndarrays due to the NaN value I've created. I'll write up findings when I have more to report, but I think I have a handle on a way forward. Thanks!

MichaelTiemannOSC · 2023-07-04T15:07:43Z

I've made a lot of progress working with pd.NA and reading through dtype("O") and validating values as UFLoat. I think it might be more elegant to create and use an UncertaintyArray, but I want to try to finish what I've almost got working, then discuss how to possibly make it more elegant with an UncertaintyArray type.

The test cases I'm looking at right now are the complex128 test cases, which, because they are actually EAs, and not ComplexArray types, are tickling what I've done in unexpected ways. Which is a good way to ensure the robustness of what I'm doing, rather than hiding behind a fresh type (I think).

andrewgsavage · 2023-08-25T22:43:20Z

has anyone had a chance to look at this?
you can view the docs here
https://pint-pandas.readthedocs.io/en/docs/

MichaelTiemannOSC · 2023-08-26T02:14:18Z

I just submitted some fresh changes to enable testing of complex128 for Pandas 2.1.0rc0+96. Over the past few weeks the pandas team have progressively improved underlying code so that as of today, essentially no special adaptations are required.

I still need to see if I can similarly simplify uncertainties, but I think that when 2.1 comes out things are going to be a lot simpler (both to document and implement).

MichaelTiemannOSC · 2023-08-26T18:35:18Z

Is there a convenient way I can leave comments inline? Like comments on a pull request?

andrewgsavage · 2023-08-26T23:44:03Z

Is there a convenient way I can leave comments inline? Like comments on a pull request?

in the .rst files that have been added

MichaelTiemannOSC · 2023-08-27T13:25:09Z

OK, so I cloned the repo and made a change to getting started, which in my version reads:

The Pandas package provides powerful DataFrame and Series abstractions for dealing with numerical, temporal, categorical, string-based, and even user-defined data (using its ExtensionArray feature).  The Pint package provides a ri\
ch and extensible vocabulary of units for constructing Quantities and an equally rich and extensible range of unit conversions to make it easy to perform unit-safe calculations using Quantities.  Pint-pandas provides PintArray, a \
Pandas ExtensionArray that efficiently implements Pandas DataFrame and Series functionality as unit-aware operations where appropriate.

Those who have used Pint know well that good units discipline often catches not only simple mistkaes, but sometimes more fundamental errors as well.  Pint-pandas can reveal similar errors when it comes to slicing and dicing Pandas\
 data.  A 1-dimensional Pandas Series can use a PintArray to hold its values.  Columns in 2-dimensional Pandas DataFrame can contain PintArrays--with all the efficiency the ExtensionArray APIs provide, but rows are a special case.\
  If all elements of the row have the same units, the row will be returned as Series backed by a PintArray with those units.  But if the units are heterogeneous, the row will be returned as a Series consisting of discrete Quantiti\
es (or raw data if the column values don't have units).  All Quantity data within such Series will follow Pint rules of unit conversions and will give error messages when units are not compatible, but some error messages may lose \
information as Pandas tries to align two incompatible Quantities to non-unitized magnitude values.  To get the greatest benefit from Pint-pandas (and Pandas in general), make your columns from homogeneous data and let your rows co\
ntain the heterogeneous data when necessary

The reason I'm telling you this in a comment and not with a PR is because I DON'T UNDERSTAND GITHUB!!! I really thought I did the right things in terms of cloning, forking, editing, etc., but GitHub insists on doing things most unintuitive to me. If I can get some help sorting out how to put my carefully placed andrewgsavage/pint-pandas repo into a properly described and defined git place that doesn't make it look like the twin of MichaelTiemannOSC/pint-pandas, I'd appreciate it. I do have hgrecco/pint and hgrecco/pint-pandas properly separated. I just somehow didn't say all the right magic when I tried to make a change relative to your repo as my upstream source.

andrewgsavage · 2023-08-27T14:01:16Z

you can add comments inline by going file changed, clicking a file, then clicking the blue + after hovering over a line
addnig comments like that is fine

MichaelTiemannOSC · 2023-08-27T14:02:09Z

That's a good solution...

andrewgsavage · 2023-08-27T14:03:44Z

if you want to make changes in a PR, you'll make a branch under MichaelTiemannOSC/pint-pandas that tracks andrewgsavage/pint-pandas:docs, then make a PR to andrewgsavage/pint-pandas:docs (ie go to https://github.com/andrewgsavage/pint-pandas/pulls )

andrewgsavage · 2023-08-27T14:06:20Z

I'll add this bit:

The Pandas package provides powerful DataFrame and Series abstractions for dealing with numerical, temporal, categorical, string-based, and even user-defined data (using its ExtensionArray feature). The Pint package provides a ri
ch and extensible vocabulary of units for constructing Quantities and an equally rich and extensible range of unit conversions to make it easy to perform unit-safe calculations using Quantities. Pint-pandas provides PintArray, a
Pandas ExtensionArray that efficiently implements Pandas DataFrame and Series functionality as unit-aware operations where appropriate.

Those who have used Pint know well that good units discipline often catches not only simple mistkaes, but sometimes more fundamental errors as well. Pint-pandas can reveal similar errors when it comes to slicing and dicing Pandas
data.

I think this bit is too in detail for the getting started section, but could fit elsewhere

A 1-dimensional Pandas Series can use a PintArray to hold its values. Columns in 2-dimensional Pandas DataFrame can contain PintArrays--with all the efficiency the ExtensionArray APIs provide, but rows are a special case.
If all elements of the row have the same units, the row will be returned as Series backed by a PintArray with those units. But if the units are heterogeneous, the row will be returned as a Series consisting of discrete Quantiti
es (or raw data if the column values don't have units). All Quantity data within such Series will follow Pint rules of unit conversions and will give error messages when units are not compatible, but some error messages may lose
information as Pandas tries to align two incompatible Quantities to non-unitized magnitude values. To get the greatest benefit from Pint-pandas (and Pandas in general), make your columns from homogeneous data and let your rows co
ntain the heterogeneous data when necessary

MichaelTiemannOSC · 2023-08-27T14:08:49Z

Please pass through a spell-check first. I notice I misspelled mistakes!

andrewgsavage · 2023-08-27T14:09:14Z

I think this bit is too in detail for the getting started section, but could fit elsewhere
An example would make this clearer and could go under common issues?

docs/getting/index.rst

docs/getting/tutorial.rst

docs/user/reading.rst

MichaelTiemannOSC · 2023-08-28T11:44:05Z

docs/ecosystem.rst

+
+- `pint <https://github.com/hgrecco/pint>` Base package
+- `pint-pandas <https://github.com/hgrecco/pint-pandas>`_ Pandas integration
+- `pint-xarray <https://github.com/xarray-contrib/pint-xarray>`_ Xarray integration


openscm-units <https://github.com/openscm/openscm-units> units related to simple climate modelling

iam_units <https://github.com/IAMconsortium/units> Pint-compatible definitions of energy, climate, and related units to supplement the SI and other units included in Pint's default_en.txt

docs/getting/tutorial.rst

MichaelTiemannOSC · 2023-08-28T12:48:49Z

docs/user/common.rst

+
+The most common issue pint-pandas users encouter is that they have a DataFrame with column that aren't PintArrays.
+An obvious indicator is unit strings showing in cells when viewing the DataFrame.
+Several pandas operations return numpy arrays of ``Quantity`` objects, which can cause this.


Quantity objects within Pandas DataFrames (or Series) will behave like Quantities, meaning that they are subject to unit conversion rules and will raise errors when incompatible units are mixed. But these loose Quantities don't offer the elegance or performance optimizations that come from using PintArrays. And they may give strange error messages as Pandas tries to convert incompatible units to dimensionless magnitudes (which is often prohibited by Pint) rather than naming the incompatibility between the two Quantities in question.

Add:

Creating DataFrames from Series

The default operation of Pandas pd.concat function is to perform row-wise concatenation. When given a list of Series, each of which is backed by a PintArray, this will inefficiently convert all the PintArrays to arrays of object type, concatenate the several series into a DataFrame with that many rows, and then leave it up to you to convert that DataFrame back into column-wise PintArrays. A much more efficient approach is to concatenate Series in a column-wise fashion:

df = pd.concat(list_of_series, axis=1)

This will preserve all the PintArrays in each of the Series.

Quantity objects within Pandas DataFrames (or Series) will behave like Quantities, meaning that they are subject to unit conversion rules and will raise errors when incompatible units are mixed. But these loose Quantities don't offer the elegance or performance optimizations that come from using PintArrays. And they may give strange error messages as Pandas tries to convert incompatible units to dimensionless magnitudes (which is often prohibited by Pint) rather than naming the incompatibility between the two Quantities in question.

It took 2-3 reads for me to follow this, referring to quantity objects and loose quantities is ambiguous (could argue PintArray contains quantity objects since getitem returns them). Some code showing your points would be clearer. Can do that in another PR

I think that the benchmark PR I made makes this point really clearly (up to 1000x performance difference). So if/when that lands, we can refer to coding (and performance examples) from that.

MichaelTiemannOSC · 2023-09-01T10:05:59Z

Plot twist: the next version of Pandas (2.1.1? 2.2.0?) will allow EAs to support 2d values, which means that the one-dimensional explanations I've given above will no longer be quite correct. Of course pint-pandas could make the decision that PintArrays are only ever one-dimensional, and we can clean up the text to say that, but we could also allow for the possibility that a whole 2-dimensional DataFrame has quantities, and that both rows and columns both allow not only showing quantified rows and columns, but both can have values set within them via .loc and .iloc while retaining their EA nature.

…nto docs

andrewgsavage · 2023-09-05T23:49:03Z

bors r+

bors · 2023-09-05T23:53:02Z

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

ci
lint

andrewgsavage added 10 commits June 27, 2023 21:46

docs

50a6406

readthedocs

2d16990

conf

5a4c005

docs

1ef1ff8

readthedocs

0c478ac

conf

c37b0e0

Merge branch 'docs' of https://github.com/andrewgsavage/pint-pandas i…

fce5bdc

…nto docs

ossies

005129c

changes

7a2e1ad

lint

330e246

Merge branch 'master' into docs

64017fc

MichaelTiemannOSC reviewed Aug 27, 2023

View reviewed changes

docs/getting/index.rst Outdated Show resolved Hide resolved

MichaelTiemannOSC reviewed Aug 27, 2023

View reviewed changes

docs/getting/tutorial.rst Show resolved Hide resolved

MichaelTiemannOSC reviewed Aug 28, 2023

View reviewed changes

docs/user/reading.rst Show resolved Hide resolved

MichaelTiemannOSC reviewed Aug 28, 2023

View reviewed changes

docs/getting/tutorial.rst Show resolved Hide resolved

MichaelTiemannOSC reviewed Aug 28, 2023

View reviewed changes

andrewgsavage added 2 commits September 6, 2023 00:43

michael review

6f94a94

Merge branch 'docs' of https://github.com/andrewgsavage/pint-pandas i…

19322e5

…nto docs

bors bot merged commit 068ded0 into hgrecco:master Sep 5, 2023
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs #191

docs #191

andrewgsavage commented Jun 27, 2023 •

edited

Loading

MichaelTiemannOSC commented Jun 30, 2023

andrewgsavage commented Jul 1, 2023

MichaelTiemannOSC commented Jul 1, 2023

MichaelTiemannOSC commented Jul 4, 2023

andrewgsavage commented Aug 25, 2023

MichaelTiemannOSC commented Aug 26, 2023

MichaelTiemannOSC commented Aug 26, 2023

andrewgsavage commented Aug 26, 2023

MichaelTiemannOSC commented Aug 27, 2023

andrewgsavage commented Aug 27, 2023

MichaelTiemannOSC commented Aug 27, 2023

andrewgsavage commented Aug 27, 2023

andrewgsavage commented Aug 27, 2023

MichaelTiemannOSC commented Aug 27, 2023

andrewgsavage commented Aug 27, 2023

MichaelTiemannOSC Aug 28, 2023

MichaelTiemannOSC Aug 28, 2023 •

edited

Loading

andrewgsavage Sep 5, 2023

MichaelTiemannOSC Sep 6, 2023

MichaelTiemannOSC commented Sep 1, 2023

andrewgsavage commented Sep 5, 2023

bors bot commented Sep 5, 2023

docs #191

docs #191

Conversation

andrewgsavage commented Jun 27, 2023 • edited Loading

MichaelTiemannOSC commented Jun 30, 2023

andrewgsavage commented Jul 1, 2023

MichaelTiemannOSC commented Jul 1, 2023

MichaelTiemannOSC commented Jul 4, 2023

andrewgsavage commented Aug 25, 2023

MichaelTiemannOSC commented Aug 26, 2023

MichaelTiemannOSC commented Aug 26, 2023

andrewgsavage commented Aug 26, 2023

MichaelTiemannOSC commented Aug 27, 2023

andrewgsavage commented Aug 27, 2023

MichaelTiemannOSC commented Aug 27, 2023

andrewgsavage commented Aug 27, 2023

andrewgsavage commented Aug 27, 2023

MichaelTiemannOSC commented Aug 27, 2023

andrewgsavage commented Aug 27, 2023

MichaelTiemannOSC Aug 28, 2023

Choose a reason for hiding this comment

MichaelTiemannOSC Aug 28, 2023 • edited Loading

Choose a reason for hiding this comment

Creating DataFrames from Series

andrewgsavage Sep 5, 2023

Choose a reason for hiding this comment

MichaelTiemannOSC Sep 6, 2023

Choose a reason for hiding this comment

MichaelTiemannOSC commented Sep 1, 2023

andrewgsavage commented Sep 5, 2023

bors bot commented Sep 5, 2023

andrewgsavage commented Jun 27, 2023 •

edited

Loading

MichaelTiemannOSC Aug 28, 2023 •

edited

Loading