Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream xclim.core.units? #780

Closed
dcherian opened this issue Jul 22, 2021 · 10 comments
Closed

upstream xclim.core.units? #780

dcherian opened this issue Jul 22, 2021 · 10 comments
Assignees
Labels
enhancement New feature or request standards / conventions Suggestions on ways forward

Comments

@dcherian
Copy link

dcherian commented Jul 22, 2021

Hello,

It looks like large parts of xclim.core.units could be upstreamed to cf-xarray and pint-xarray. Would you be interested in doing this?

The unit registry definitions could be moved to cf-xarray now that it provides a pint registry that aims to be CF-aware (mostly copied from MetPy right now).

pin2cfunits looks particularly useful as a custom format for "dequantifying" pint-backed xarray objects (cc @keewis, @TomNicholas, @jthielen).

def pint2cfunits(value: UnitDefinition) -> str:

@aulemahal
Copy link
Collaborator

Yes! That is one of our long-term goals. Currently, all unit handling is done "by hand" inside xclim, which is suboptimal. I think moving a much of what we can upstream is a great idea.

The registry modifications here are also based on MetPy, but augmented as other issues arose.

I don't think we will be able to remove all unit handling from xclim, but switching to integrated units will be a plus. I'll try to look into this when I have time, but I don't think anyone from the core team will have much time to work on this in the next month.

@aulemahal aulemahal self-assigned this Jul 26, 2021
@aulemahal aulemahal added this to the xclim Long Term Goals milestone Jul 26, 2021
@aulemahal aulemahal added the enhancement New feature or request label Jul 26, 2021
@keewis
Copy link

keewis commented Oct 10, 2021

posted first as jbusecke/xMIP#167 (comment):

FYI the custom unit formats PR was merged into pint so now it should be possible to register xclim.core.units.pint2cfunits as "cfunits" or "cf" and get something like f"{quantity:.3f#cfunits}" to work (documentation is still missing, though)

see hgrecco/pint#1375 and hgrecco/pint#1371

@dcherian
Copy link
Author

Nice work @keewis . This looks like a good time to upstream this to cf_xarray.units!

@keewis
Copy link

keewis commented Oct 10, 2021

just tried it:

In [7]: import pint
   ...: 
   ...: pint.formatting._FORMATTERS.pop("cf", None)
   ...: 
   ...: @pint.register_unit_format("cf")
   ...: def pint2cfunits(unit, registry, **options) -> str:
   ...:     """Return a CF-compliant unit string from a `pint` unit.
   ...: 
   ...:     Parameters
   ...:     ----------
   ...:     unit : pint.UnitContainer
   ...:         Input unit.
   ...:     registry : pint.UnitRegistry
   ...:         the associated registry
   ...:     **options
   ...:         Additional options (may be ignored)
   ...: 
   ...:     Returns
   ...:     -------
   ...:     out : str
   ...:         Units following CF-Convention, using symbols.
   ...:     """
   ...:     import re
   ...: 
   ...:     # convert UnitContainer back to Unit
   ...:     unit = registry.Unit(unit)
   ...:     # Print units using abbreviations (millimeter -> mm)
   ...:     s = f"{unit:~D}"
   ...: 
   ...:     # Search and replace patterns
   ...:     pat = r"(?P<inverse>/ )?(?P<unit>\w+)(?: \*\* (?P<pow>\d))?"
   ...: 
   ...:     def repl(m):
   ...:         i, u, p = m.groups()
   ...:         p = p or (1 if i else "")
   ...:         neg = "-" if i else ("^" if p else "")
   ...: 
   ...:         return f"{u}{neg}{p}"
   ...: 
   ...:     out, n = re.subn(pat, repl, s)
   ...: 
   ...:     # Remove multiplications
   ...:     out = out.replace(" * ", " ")
   ...:     # Delta degrees:
   ...:     out = out.replace("Δ°", "delta_deg")
   ...:     return out.replace("percent", "%")
   ...: 
   ...: ureg = pint.application_registry
   ...: u = ureg.Unit("m ** 2 / (s ** 3 * kg)")
   ...: display(u)
   ...: display(f"{u:cf}")
<Unit('meter ** 2 / kilogram / second ** 3')>
'm^2 kg-1 s-3'

which I think should do the same as pint2cfunits.

The caret seems a bit odd to me (I was assuming it would return m2 kg-1 s-3) but this seems intentional and I don't know that much about cfunits.

@huard
Copy link
Collaborator

huard commented Oct 12, 2021

This is nice !

@aulemahal
Copy link
Collaborator

Indeed that's nice!

So as for upstreaming, I understand that for the moment it concerns the following elements:

  • Unit and dimensionalities definitions.
    • Does this include the "hydro" context? This is something we use to convert water flux units without having to explicitly specify the water's density. But it might be too magic for cf-xarray.
  • pint2cfunits registered has shown above.
  • units2pint : to read the (cf) units of a DataArray into a pint object.

Other tools of xclim.core.units are for unit manipulation and AFAIU this is left to pint[-xarray] for the moment?

@keewis
Copy link

keewis commented Oct 12, 2021

I'd keep the "hydro" context in xclim for now: I can't really decide whether or not it fits into cf-xarray, and pint-xarray doesn't support contexts yet – we still need to figure out how to best support those.

units2pint should already be in cf-xarray (see cf_xarray.units) and pint-xarray, and if something is missing we can extend them.

As for the unit manipulation, if you can list what exactly you have I can help figuring out what pint-xarray already does, what it might do in the future and what should be kept in xclim (if anything at all)

@aulemahal
Copy link
Collaborator

Quite honestly, I don't like the "hydro" context myself, but it reflects an issue I have seen in many CF-oriented projects : conversion from flux to amount is often implicit in the context of water. In xclim, we are trying to remove the need for this by explicitly converting from rate to amount and vice-versa.

xclim's units submodule contains:

  • pint_multiply : Multiply a DataArray by a pint Quantity, return a DataArray with correct (cf) units.
  • str2pint : Convert a string representing either a quantity or only units, to a pint Quantity.
  • convert_units_to: convert a string, DataArray or pint Quantity to the units parsed from a DataArray or Quantity. Returns a DataArray or a number.
  • infer_sampling_units : Infers the sampling frequency of a DataArray and returns magnitude and units (ex: daily data returns (1, 'd'))
  • to_agg_units : Infers, converts and sets the correct units after a aggregating operation. For example, "counting" events along the time dimension returns units corresponding to the sampling frequency. Especially useful when calculating degree-days indicators.
  • rate2amount and amount2rate : Convert between the two, assuming a flux is constant for all the time period described by the timestamps.

@Zeitsperre Zeitsperre added question standards / conventions Suggestions on ways forward labels Oct 19, 2021
@dcherian
Copy link
Author

dcherian commented Oct 24, 2021

The caret seems a bit odd to me

CF follows UDUNITS so it should be m2 or meter^2 (https://www.unidata.ucar.edu/software/udunits/udunits-2.2.28/udunits2lib.html#Examples).

Perhaps we should have two versions: one for m2 and another for meter^2?

@tlogan2000
Copy link
Collaborator

closing for #1010

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request standards / conventions Suggestions on ways forward
Projects
None yet
Development

No branches or pull requests

6 participants