Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FB] [PI-3478] Lenient cube arithmetic #3774

Merged
merged 33 commits into from
Aug 13, 2020

Conversation

bjlittle
Copy link
Member

@bjlittle bjlittle commented Aug 5, 2020

This PR fixes (pretty much) everything that is very, very wrong with cube arithmetic.

It builds on PR #3583 and PR #3739, and is related to the parent issue #3478.

TODO:

  • check IFunc
  • check apply_ufunc
  • check in_place maths
  • address the circular issue 😬
  • cherry pick master fixes
  • fix travis-ci failures
  • rationalise logger
  • fix default logging level
  • var_name fix for lenient metadata equality

Required to git cherry-pick the following PRs from master to resolve known travis-ci test failure issues:


The following tasks are out of scope for this particular PR, but will be addressed immediately afterwards. This approach is being taken in order to expedite this PR and unblock other critical PRs required for the imminent iris v3.0.0 release.

This suggested tasks to perform, in order are as follows:

  • add associated supporting documentation
  • add a whatsnew entries
    • iris.analysis.maths.apply_ufunc kwarg other_cube -> other, see here
    • iris.analysis.maths.IFunc.__call__ demote ValueError to logging message, akin to apply_ufunc for data_func.nin == 1, see here
  • check mapping rationalisation during _metadata_prepare
  • add inline comments
  • add numpy style doc-strings
  • add new cube arithmetic integration and unit test coverage
  • check coordinate points/bounds dtype promotion
  • check DimCoordMetadata vs CoordMetadata
  • extend iris.util.array_equal to support masked arrays
    • check coordinate points/bounds ndarray vs MaskedArray

Tasks pegged as future work. To be raised as separate individual issues and tackled for iris 3.1:

@bjlittle bjlittle force-pushed the lenient-metadata-maths branch from 590e62f to 52a39e9 Compare August 5, 2020 23:01
Copy link
Member

@lbdreyer lbdreyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am by no means done reviewing (!) I just thought there were some comments I wanted raise sooner rather than later to get the conversations started now, particularly the comment about rhs/lhs and src/tgt as that is really confusing me!

import numpy as np
from numpy import ma

import iris.analysis
from iris.common import SERVICES, Resolve
from iris.common.lenient import _lenient_client as lenient_client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure what the consequences of this. By importing _lenient client as lenient client, does this technically expose it as part of the public api? i.e. via iris.analysis.maths.lenient_client?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Yup, it's probably best to keep this as _lenient_client

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -838,39 +730,59 @@ def _binary_op_common(
`cube` and `cube.data`
"""
_assert_is_cube(cube)

skeleton = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here explaining what you mean by skeleton?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer As per the description task list, I've not finished adding comments or doc-strings, but yes, happy to add a comment for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -915,26 +827,57 @@ def _broadcast_cube_coord_data(cube, other, operation_name, dim=None):
return points


def _sanitise_metadata(cube, unit):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like quite a general function that could be useful elsewhere for purposes other than just maths. I'm worried if it was hidden away here, we may forget it's here. Perhaps you could move it to iris.common.metadata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Hmmm not sure.

It does seem like a general function, but at the moment it's very specific to the metadata contract of maths, so I'm inclined to keep it as it is, and not expose it to the public API.

from iris.common import LENIENT


__all__ = ["Resolve"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this module to be a bit obscure. The name is not descriptive (iris.common.resolve.Resolve could really refer to anything) and there is no module docstring or docstring for the Resolve class. It requires reading a fair amount of the code before you can deduce what the intention behind it is.
My suggestion (in order of preference):

  1. Move the Resolve class to the iris.common.metada. Afterall it is about resolving metadata, so it makes sense to live there. Does it justify its own module
  2. Rename this module to iris.common.resolve_metadata and add a module docstring.

Copy link
Member Author

@bjlittle bjlittle Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer For me, this class doesn't belong in iris.common.metadata.

It's certainly using iris.common.metadata but it's primarily dealing with higher order objects, such as cubes, coordinates and factories. So I believe it stands in its own right.

Naming things can be the hardest part, so I'm open to suggestions, but I'd be keen to steer clear of incorporating the word metadata in the module or class name... as a term I think the use of metadata is already overloaded within iris.

Hmmm, I'll have a think...

Also, as I said previously, I've not finished fully commenting the code or adding doc-strings, so that will come.

self(lhs, rhs)

def __call__(self, lhs, rhs):
self._init(lhs, rhs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the reason behind the added complexity of including a init, call and _init
when you only ever do instantiate it in iris.analysis.math._binary_op_common:

elif isinstance(other, iris.cube.Cube):
# prepare to resolve the cube operands and associated coordinate
# metadata into the resultant cube.
resolve = Resolve(cube, other)

So why not just have a init solely?

Copy link
Member Author

@bjlittle bjlittle Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer I don't want to restrict a Resolve instance to being only bound to the cubes that it's provided at instantiation.

I also want to support the pattern of creating a single resolve instance and calling it multiple times e.g.,

result = Resolve(lhs, rhs).cube(data)

and

resolver = Resolve()

for ...:
    result = resolver(lhs, rhs).cube(data)

items_dim=[], items_aux=[], items_scalar=[]
)
self.category_common = _CategoryItems(
items_dim=[], items_aux=[], items_scalar=[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just do this in the self._init function where you originally set these to None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Yup, I could do... but I was adopting the style of declaring all the public and private members in the Resolve._init and configuring them within the context of where they are populated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

coordinates common to both cubes.

This is only applicable to coordinates that are members of the
'aux_coords' or dim_coords' of the participating cubes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing an extra quote

Suggested change
'aux_coords' or dim_coords' of the participating cubes.
'aux_coords' or 'dim_coords' of the participating cubes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Nice spot 👀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# Map RHS cube to LHS cube, or smaller to larger cube rank.
if self.map_rhs_to_lhs:
src_cube = self.rhs_cube
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding the lhs/rhs and src/tgt thing really hard to follow. It doesn't seem consistent?
Here if self.map_rhs_to_lhs is True, then src=rhs and tgt=lhs. However, in self._metadata_resolve, on lines 783-789, if self.map_rhs_to_lhs is True, you set the args such that src=lhs and tgt=rhs

As a suggestion, why not set lhs/rhs as src/tgt once, in the init, then you only have to worry about src/tgt and no longer have to keep checking self.map_rhs_to_lhs

Copy link
Member Author

@bjlittle bjlittle Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer That's understandable. I totally get that...

But these concepts are pretty core to Resolve, and it took me a while to settle on the best way to do this. But there are two important concepts at play here, that are subtly related, but different.

Essentially, the lhs and rhs refer to the cube operands, and these never change. They bind to the order of the cubes that are presented to Resolve.__init__ or Resolve.__call__. In the context of cube arithmetic, these are the lhs and the rhs operands of the binary arithmetic operation. It's important to maintain the order of the cubes that are given to Resolve, and not swap them around, ever.

During the process of preparing to reconcile or resolve the lhs and rhs cubes into a single resultant cube, one cube must map to the other i.e., you could think of it as, one of the cubes will be the target for an in-place operation. Typically, it's the cube with the higher dimensionality, in that case that's the tgt and the other is the src, but this may change due to broadcasting and free dimensions (i.e., anonymous dimensions with no coordinate metadata describing them).

Also, this situation gets more involved when the lhs and rhs have the same dimensionality... the point is, the src and tgt are not akin to the lhs and rhs, as the mapping of src to tgt can change depending on several factors which are determined at runtime as the lhs and rhs are compared and analysed.

Fundamentally, the use of src and tgt, and it's ability to change, is paramount to provisioning commutative results.

We should talk more about this, to help clarify... also more doc-strings and comments would help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Also, the Resolve.mapping, is tightly coupled to the concept of src and tgt, where the Resolve.mapping.keys() are the src cube dimensions, and the Resolve.mapping.values() are the tgt cube dimensions, always.
The freedom and flexibility to invert this Resolve.mapping relationship, which is required at appropriate times, is totally independent of the lhs and rhs original operation ordering.
That's a simple, yet important concept, that makes quite complex behaviour simply possible.

from iris.common import LENIENT


__all__ = ["Resolve"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this part of the public API? Why not just make it private for now?

Copy link
Member Author

@bjlittle bjlittle Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbdreyer Simply because Resolve deprecates other functionality within iris, hence it requires to be part of the public API.

Also, its functionality is quite generic and not necessarily tied to cube arithmetic. It provides a very powerful way to combine two cubes into one resultant cube.

@bjlittle bjlittle marked this pull request as ready for review August 12, 2020 20:28
)


def apply_ufunc(
ufunc, cube, other_cube=None, new_unit=None, new_name=None, in_place=False
ufunc, cube, other=None, new_unit=None, new_name=None, in_place=False
Copy link
Member Author

@bjlittle bjlittle Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change to the API for apply_ufunc, but it is completely warranted, as changing the kwarg from other_cube to other aligns this with the rest of the iris.analysis.maths public API.

Consequently, there will be an appropriate entry in the whatsnew for the community for iris 3.0.0

f"provided data_func {self._data_func_name!r} only requires "
"1 input"
)
logger.debug(dmsg)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a logging debug message rather than raising a ValueError, which now aligns the IFunc.__call__ behaviour for a data_func.nin == 1 to that of apply_ufunc with a ufunc.nin == 1.
This will have an appropriate whatsnew entry for the community.

@bjlittle bjlittle force-pushed the lenient-metadata-maths branch from 2856eb9 to 4ae36ae Compare August 13, 2020 11:57
@abooton
Copy link
Contributor

abooton commented Aug 13, 2020

@ldreyer has kindly carried out the review of the changes in this PR, and so I am merging this PR into the cube-arithmetic feature branch, with the caveat that the outstanding items (list in PR description) are to be addressed prior to the Iris3.0 release.

@abooton abooton merged commit 0e15178 into SciTools:cube-arithmetic Aug 13, 2020
bjlittle added a commit to bjlittle/iris that referenced this pull request Aug 14, 2020
* initial cube arithmetic

* support in-place cube resolve

* fix non in-place broadcasting

* remove temporary resolve scenario test

* lenient/strict support for attributes dicts with numpy arrays

* lenient/strict treatment of scalar coordinates

* strict points/bounds matching

* lenient/strict prepare local dim/aux/scalar coordinates

* support extended broadcasting

* always raise exception on points/bounds mismatch

* ignore scalar points/bounds mismatches, lenient only

* remove todos

* tidy logger debugs

* qualify src/tgt cube references in debug

* Numpy rounding fix (SciTools#3758)

ensure rounding is numpy like (maintains type)

* avoid unittest.mock.sentinel copy issue

* fast load np.int32

* fix cube maths doctest

* fix iris.common.resolve logging configuration

* fix prepare points/bounds + extra metadata cml

* support mapping reversal based on free dims

* var_name fix for lenient equality

* add support for DimCoordMetadata

* fix circular flag + support CoordMetadata and DimCoordMetadata exchange

* fix circular issue for concatenate DimCoord->AuxCoord demotion

* fix concatenate._CubeSignature sorted

* minor tweaks

* keep lenient_client private in maths

* tidy maths

* tidy iris.analysis.maths.IFunc

* refactor IFunc test

* polish in-place support

* tidy metadata_resolve

Co-authored-by: stephenworsley <[email protected]>
abooton pushed a commit that referenced this pull request Aug 14, 2020
* PI-3478: Common metadata API (#3583)

* common metadata api

* rationalise _cube_coord_common into common

* move state into metadata

* MetadataFactory test coverage

* temporarily pin back iris-grib

* test coverage for iris.common.metadata._BaseMeta

* test coverage for iris.common.mixin.LimitedAttributeDict

* remove temporary iris-grib pin

* review actions

* Update lib/iris/tests/unit/common/metadata/test_BaseMetadata.py

Co-Authored-By: lbdreyer <[email protected]>

* [FB] [PI-3478] Lenient metadata (#3739)

* add lenient infra-structure

* add metadata lenient __eq__ support

* complete __eq__, combine and difference support

* explicit inherited lenient_service + support equal convenience

* fix attributes difference + lenient kwargs

* make lenient public + minor tidy

* rename MetadataManagerFactory to metadata_manager_factory

* extend lenient_client decorator to support services registration

* add lenient test coverage

* purge qualname usage in metadata.py

* support global enable for lenient services

* support partial mapping metadata assignment

* purge Lenient.__setattr__ from api

* add BaseMetadata compare test coverage

* metadata rationalisation

* add BaseMetadata difference test coverage

* added context manager ephemeral comment clarification

* add BaseMetadata __ne__ test coverage

* standardise lenient decorator closure names

* add BaseMetadata equal test coverage

* half dunder context

* add AncillaryVariableMetadata test coverage

* add additional AncillaryVariableMetadata test coverage

* add CellMeasureMetadata test coverage

* Clarify lenient_service operation + simplify code.

* add CoordMetadata test coverage

* add CubeMetadata test coverage

* metadata tests use self.cls

* fix typo

* fix context manager ephemeral services

* add logging

* Pin pillow to make graphics tests work again. (#3630)

* Fixed tests since Numpy 1.18 deprecation of non-int num arguments for linspace. (#3655)

* Switched use of datetime.weekday() to datetime.dayofwk. (#3687)

* New image hashes for mpl 3x2 (#3682)

* New image hash for iris.test.test_plot.TestSymbols.test_cloud_cover with matplotlib 3.2.0.

* Further images changes for mpl3x2.

* Yet more updated image results.

* fix sentinel uniqueness test failure

* remove redundant cdm mapping test

* difference returns None for no difference

* protect Lenient and LENIENT private

* privitise lenient framework and add API veneer

* add explicit maths feature default

* review actions

* review actions

* trexfeathers review actions

* stephenworsley review actions

Co-authored-by: Patrick Peglar <[email protected]>
Co-authored-by: Martin Yeo <[email protected]>

* [FB] [PI-3478] Lenient cube arithmetic (#3774)

* initial cube arithmetic

* support in-place cube resolve

* fix non in-place broadcasting

* remove temporary resolve scenario test

* lenient/strict support for attributes dicts with numpy arrays

* lenient/strict treatment of scalar coordinates

* strict points/bounds matching

* lenient/strict prepare local dim/aux/scalar coordinates

* support extended broadcasting

* always raise exception on points/bounds mismatch

* ignore scalar points/bounds mismatches, lenient only

* remove todos

* tidy logger debugs

* qualify src/tgt cube references in debug

* Numpy rounding fix (#3758)

ensure rounding is numpy like (maintains type)

* avoid unittest.mock.sentinel copy issue

* fast load np.int32

* fix cube maths doctest

* fix iris.common.resolve logging configuration

* fix prepare points/bounds + extra metadata cml

* support mapping reversal based on free dims

* var_name fix for lenient equality

* add support for DimCoordMetadata

* fix circular flag + support CoordMetadata and DimCoordMetadata exchange

* fix circular issue for concatenate DimCoord->AuxCoord demotion

* fix concatenate._CubeSignature sorted

* minor tweaks

* keep lenient_client private in maths

* tidy maths

* tidy iris.analysis.maths.IFunc

* refactor IFunc test

* polish in-place support

* tidy metadata_resolve

Co-authored-by: stephenworsley <[email protected]>

* rebase master fix-up for cube arithmetic

* add missing new dependency to readthedocs.yml requirements

Co-authored-by: lbdreyer <[email protected]>
Co-authored-by: Patrick Peglar <[email protected]>
Co-authored-by: Martin Yeo <[email protected]>
Co-authored-by: stephenworsley <[email protected]>
@bjlittle bjlittle deleted the lenient-metadata-maths branch February 15, 2021 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants