Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting a pint.Quantity to PintArray when adding to a pd.DataFrame #248

Open
Musaefendic opened this issue Aug 12, 2024 · 3 comments
Open

Comments

@Musaefendic
Copy link

Description

I tried adding a pint.Quantity to an existing pd.DataFrame, thinking that pint-pandas might transform the Quantity into a PintArray that:

  • matches the number of rows
  • preserves the unit.

Reproducible Example

import pandas as pd
from pint_pandas import PintArray, Quantity

# Existing pd.DataFrame
data = {"bar": [0.07, 0.30, 0.85, 1.00]}
df = pd.DataFrame(data)

# Trying to add a `pint.Quantity`
df['content'] = Quantity(42.0, units='percent')

df.dtypes 

# Output:
# bar      float64
# content  object    # <---- Expected: pint[percent]

output:

bar content
0 0.07 42.0 percent
1 0.30 42.0 percent
2 0.85 42.0 percent
3 1.00 42.0 percent

Question

The documentation indeed suggests using a pd.Series or a PintArray to achieve this, but this approach feels a bit verbose. I’d like to add a new column directly with just Quantity to mimic the pandas API when creating a new column from a float to an existing pd.DataFrame.

Would it make sense to convert a pint.Quantity into a PintArray when adding it to a pd.DataFrame?

@andrewgsavage
Copy link
Collaborator

An ExtensionArray must have a defined length, making your suggestion not possible.

Yes, it would be very nice to have. To get this work, pandas needs to identify the quantity as a scalar of PintType, then try to construct a PintArray.
pandas-dev/pandas#27995

@mutricyl
Copy link
Contributor

mutricyl commented Oct 4, 2024

I started to work on the panda side of this issue https://github.com/mutricyl/pandas/tree/27995_infer_EA_from_obj

I came across a constructor issue with pint_pandas:

>>> import pandas as pd
>>> import numpy as np
>>> import pint_pandas
>>> km = pd.Series([1.0, 2.0, 3.0], dtype="pint[km]")
>>> ndarray_object = km.to_numpy()  # creates a numpy array of Quantity with dtype == object
>>> ndarray_object
array([<Quantity(1.0, 'kilometer')>, <Quantity(2.0, 'kilometer')>,
       <Quantity(3.0, 'kilometer')>], dtype=object)
>>> pint_pandas.PintArray(ndarray_object)
NotImplementedError
>>> pint_pandas.PintArray(ndarray_object, dtype=type(ndarray_object))
ValueError: could not construct PintType

Am I using improperly PintArray constructor or should we be able to construct a PintArray from a ndarray of Quantities ?

@andrewgsavage
Copy link
Collaborator

fyi I had a go at fixing it here
pandas-dev/pandas#59767

yes, ideally that should also work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants