-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FB] [PI-3478] Lenient metadata #3739
Conversation
c6bce3d
to
c439579
Compare
Only briefly scanned so far. (1) (2) |
5d9a7b5
to
f14f3df
Compare
Like @lbdreyer (I think?), I find the existing "LENIENT.context" API rather unintuitive : hard to use, and hard to understand. However, with current code, I think a basic problem with that is that we can't use service identifiers as keywords, even though we have standard string forms for them, because the strings are not valid keywords ! |
(and, probably related to above point...) I do appreciate that the binary lenient/strict choice is a really useful simplification in the existing proposals, but I still think we might want to leave space to extend this in future. E.G. you might want a form that allows you have specify "load with a merge which is lenient on attributes but strict on names" ?
N.B. Sorry this isn't at all a clear suggestion yet |
I've crossed over with this. |
A couple of notes about the testing, from work on pp-mo#62
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have some tests to review, but I thought I'd submit the comments I have so far.
I have a few minor comments about the iris.common.lenient
module, but most of my more pertinent comments are about iris.comon.metadata
, so my preference is to get those resolved first if possible
result = ( | ||
self.coord_system == other.coord_system | ||
and self.climatological == other.climatological | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the plan was to be lenient with coord_system
but strict with climatological
?
So for:
cube_a.coord_system = Mercator
cube_b.coord_system = None
Should compare as True but this code would return False.
However, if we do want to be strict with both, could you rework this to use _members
? It may save us in future when we add an extra member to CoordMetadata; saves us having to remember to also change it here?
Something like:
result = all([self.field == other.field for field in self._members])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is getting interesting, and to the heart of the question "what should be lenient?"
Personally, I think that climatological
should be compared strictly in lenient mode - I guess simply because it's boolean, and mixing in leniency doesn't really help matters, so keep it simple.
For coord_system
I flipped on this a few times. But at the end of the day I think that it's a pretty important core piece of metadata that makes a massive difference to interpreting the associated spatial coordinates. I think that it's so important that users should always specify it and not rely on defaults or assumptions. We've certainly not made any statements regarding the default coord_system
if it's not specified. So in that regard I erred on the side of comparing strictly in lenient mode - it seems like a non-controversial, safe stance to take.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went explicit when comparing, but happy to be more generic as you suggest. I've applied that pattern elsewhere, so why not here 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK: almost any time that a cube coordinate is used, the coordinate system will make important differences (certainly in plotting, for example), so it makes sense to always check equality strictly. As @bjlittle says, the fact that we don't have a default coordinate system means we should also treat None
as explicitly different to a specified coordinate system (IMHO).
value = ( | ||
self.cell_methods | ||
if self.cell_methods == other.cell_methods | ||
else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit undecided about this. What if we have:
cube_a.cell_methods = [cell_method_a, cell_method_b]
cube_b.cell_methods = [cell_method_a]
So they both have cell_method_a
, but the first cube also has another cell_method which the second cube doesn't have.
Currently this just throws both away, but should we consider keeping the common cell_method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm it's a good question... again a case where I'm not sure so erred on the side of being strict in lenient mode.
I feel like it kinda, sorta depends on the operation the user is performing, and it's not possible to guess the context at this particular lower level in the code. My naive philosophy here was that if the cell methods are the same then it's pretty obvious to know what to do, otherwise don't even attempt to guess what to do (as it might be half right, or even worse completely wrong) so leave the user to apply their intelligence and experience to the metadata.
As a concrete example, consider adding a 3d cube (t, y, x
) and a 2d (y, x
) cube. They both have cell methods, but the 3d cube has two cell methods on the t
and y
dimensions, whereas the 2d cube has a single cell method on the y
dimension. Now the result of the addition
operation is a 3d cube (thru the wonders of broadcasting), and you may argue that the resultant cube should also have two cell methods - having one cell method would be misleading. In actual fact, the context of the operation lives with the addition
operation, it can make a reasonable assertion of what the cell methods should be, and as it happens it throws away the cell methods, and I kinda agree with that. Even so, the user can easily add them back, if they care and it makes sense to them.
So, after revisiting this and coming full circle again, I'm back to the same conclusion that what I've proposed in this PR seems reasonable, and a safe stance to take (and don't ask me tomorrow, as I may change my mind 😄)
Anyways, I'm kinda glad that your finding it "non-obvious" and grappling with the same conundrums that I found myself pondering over 😉
I think we just need to be a fair mix of reasonable and opinionated here.... not unless I'm missing completely obvious.
I've no idea if this has been of any help, whatsoever to you...
) | ||
|
||
@lenient_service | ||
def combine(self, other, lenient=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have you decided to make these public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what do you mean? Make what public? The method? Or the lenient
keyword argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that CubeMetadata.combine
CubeMetadata.difference
and CubeMetadata.equal
are a part of the public API, so users could happily use them, and expect that in upcoming minor releases their code would still work?
I am possibly being a bit overcautious with what we add to the public API, since this is for a major release, so it would be a long time before we could make changes to anything public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original intention was always for these methods to be public.
I've always seen them as offering stable behaviour (i.e., they're not going to fundamentally change from as is now) and that they offer real, tangible benefit and value not only to us for internal iris
use, but also to users - they now have some rich behaviour to go along with the metadata overhaul that we've banked so far through the common metadata API.
So for me adding them completes this whole common metadata API that we're now offering, and I don't associate any risk with that (certainly no more so that introducing anything else new into iris
).
I think that it would be a mistake not to make them public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments, specifically relating to metadata testing.
I still have to review the lenient.py testing but I'm hoping to do that tomorrow
def test_cannot_compare(self): | ||
other = CubeMetadata(*(None,) * len(CubeMetadata._fields)) | ||
result = self.metadata.__eq__(other) | ||
self.assertIs(NotImplemented, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a more descriptive name for this test. Maybe something lik
test_cannot_compare_diff_class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can imagine, I was running low on "creative test naming juice"... Happy to update it 👍
rmetadata = self.cls(**right) | ||
|
||
self.assertTrue(lmetadata._compare_lenient(rmetadata)) | ||
self.assertTrue(rmetadata._combine_lenient(lmetadata)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_combine_lenient -> _compare_lenient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa... freaky test pass 😱
Nice spot, I'll fix that 👍
|
||
result = lmetadata._combine_lenient(rmetadata) | ||
expected = list(left.values()) | ||
self.assertEqual(expected, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that for most tests you check that the operation is commutative, for example in the next test (test_strict_units_different
) you do :
result = lmetadata._combine_lenient(rmetadata)
as well as
result = rmetadata._combine_lenient(lmetadata)
For this test, however, you only do the one way,
result = lmetadata._combine_lenient(rmetadata)
I suppose it might be a bit overkil to do both? Did you only do it the one way when you were comparing the metadata objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I was rolling through the testing (while took a looooong time) my testing approach slowly matured and I progressed.
The commutative testing angle came along mid-way through, and I've forgotten to apply it retrospectively - nice spot 👀
I defo think it's worthwhile doing. So I'll sweep back through the tests and apply this where it might be applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that I latched onto the commutative pattern of testing post BaseMetadata
testing. So I'll add some similar coverage there now where it's missing... thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to bank my comments on the functional code in case I don't finish looking at the tests (not really keeping up with everyone else!).
The implementation makes sense and is follow-able 👍 I have some hang-ups about the generality of some of the error messages, which I commented on in the code, but I wouldn't consider these deal-breakers.
result = ( | ||
self.coord_system == other.coord_system | ||
and self.climatological == other.climatological | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK: almost any time that a cube coordinate is used, the coordinate system will make important differences (certainly in plotting, for example), so it makes sense to always check equality strictly. As @bjlittle says, the fact that we don't have a default coordinate system means we should also treat None
as explicitly different to a specified coordinate system (IMHO).
Okay I'm finished reviewing the tests now and I don't have any outstanding issues I want to raise. So that means... 🥁 (drumroll) 🥁 ... I am going to merge this in! Just in time for the Friday deadline 🙌 Lots of hard work has gone in to this PR! 💯 💯 A big thank to @bjlittle and everyone who helped to review!! |
@lbdreyer Awesome! Thanks for pushing this PR over the line, much appreciated... this now unblocks me for the next and last PR in this feature branch, which will finally fix cube arithmetic 😄 |
* add lenient infra-structure * add metadata lenient __eq__ support * complete __eq__, combine and difference support * explicit inherited lenient_service + support equal convenience * fix attributes difference + lenient kwargs * make lenient public + minor tidy * rename MetadataManagerFactory to metadata_manager_factory * extend lenient_client decorator to support services registration * add lenient test coverage * purge qualname usage in metadata.py * support global enable for lenient services * support partial mapping metadata assignment * purge Lenient.__setattr__ from api * add BaseMetadata compare test coverage * metadata rationalisation * add BaseMetadata difference test coverage * added context manager ephemeral comment clarification * add BaseMetadata __ne__ test coverage * standardise lenient decorator closure names * add BaseMetadata equal test coverage * half dunder context * add AncillaryVariableMetadata test coverage * add additional AncillaryVariableMetadata test coverage * add CellMeasureMetadata test coverage * Clarify lenient_service operation + simplify code. * add CoordMetadata test coverage * add CubeMetadata test coverage * metadata tests use self.cls * fix typo * fix context manager ephemeral services * add logging * Pin pillow to make graphics tests work again. (SciTools#3630) * Fixed tests since Numpy 1.18 deprecation of non-int num arguments for linspace. (SciTools#3655) * Switched use of datetime.weekday() to datetime.dayofwk. (SciTools#3687) * New image hashes for mpl 3x2 (SciTools#3682) * New image hash for iris.test.test_plot.TestSymbols.test_cloud_cover with matplotlib 3.2.0. * Further images changes for mpl3x2. * Yet more updated image results. * fix sentinel uniqueness test failure * remove redundant cdm mapping test * difference returns None for no difference * protect Lenient and LENIENT private * privitise lenient framework and add API veneer * add explicit maths feature default * review actions * review actions * trexfeathers review actions * stephenworsley review actions Co-authored-by: Patrick Peglar <[email protected]> Co-authored-by: Martin Yeo <[email protected]>
* PI-3478: Common metadata API (#3583) * common metadata api * rationalise _cube_coord_common into common * move state into metadata * MetadataFactory test coverage * temporarily pin back iris-grib * test coverage for iris.common.metadata._BaseMeta * test coverage for iris.common.mixin.LimitedAttributeDict * remove temporary iris-grib pin * review actions * Update lib/iris/tests/unit/common/metadata/test_BaseMetadata.py Co-Authored-By: lbdreyer <[email protected]> * [FB] [PI-3478] Lenient metadata (#3739) * add lenient infra-structure * add metadata lenient __eq__ support * complete __eq__, combine and difference support * explicit inherited lenient_service + support equal convenience * fix attributes difference + lenient kwargs * make lenient public + minor tidy * rename MetadataManagerFactory to metadata_manager_factory * extend lenient_client decorator to support services registration * add lenient test coverage * purge qualname usage in metadata.py * support global enable for lenient services * support partial mapping metadata assignment * purge Lenient.__setattr__ from api * add BaseMetadata compare test coverage * metadata rationalisation * add BaseMetadata difference test coverage * added context manager ephemeral comment clarification * add BaseMetadata __ne__ test coverage * standardise lenient decorator closure names * add BaseMetadata equal test coverage * half dunder context * add AncillaryVariableMetadata test coverage * add additional AncillaryVariableMetadata test coverage * add CellMeasureMetadata test coverage * Clarify lenient_service operation + simplify code. * add CoordMetadata test coverage * add CubeMetadata test coverage * metadata tests use self.cls * fix typo * fix context manager ephemeral services * add logging * Pin pillow to make graphics tests work again. (#3630) * Fixed tests since Numpy 1.18 deprecation of non-int num arguments for linspace. (#3655) * Switched use of datetime.weekday() to datetime.dayofwk. (#3687) * New image hashes for mpl 3x2 (#3682) * New image hash for iris.test.test_plot.TestSymbols.test_cloud_cover with matplotlib 3.2.0. * Further images changes for mpl3x2. * Yet more updated image results. * fix sentinel uniqueness test failure * remove redundant cdm mapping test * difference returns None for no difference * protect Lenient and LENIENT private * privitise lenient framework and add API veneer * add explicit maths feature default * review actions * review actions * trexfeathers review actions * stephenworsley review actions Co-authored-by: Patrick Peglar <[email protected]> Co-authored-by: Martin Yeo <[email protected]> * [FB] [PI-3478] Lenient cube arithmetic (#3774) * initial cube arithmetic * support in-place cube resolve * fix non in-place broadcasting * remove temporary resolve scenario test * lenient/strict support for attributes dicts with numpy arrays * lenient/strict treatment of scalar coordinates * strict points/bounds matching * lenient/strict prepare local dim/aux/scalar coordinates * support extended broadcasting * always raise exception on points/bounds mismatch * ignore scalar points/bounds mismatches, lenient only * remove todos * tidy logger debugs * qualify src/tgt cube references in debug * Numpy rounding fix (#3758) ensure rounding is numpy like (maintains type) * avoid unittest.mock.sentinel copy issue * fast load np.int32 * fix cube maths doctest * fix iris.common.resolve logging configuration * fix prepare points/bounds + extra metadata cml * support mapping reversal based on free dims * var_name fix for lenient equality * add support for DimCoordMetadata * fix circular flag + support CoordMetadata and DimCoordMetadata exchange * fix circular issue for concatenate DimCoord->AuxCoord demotion * fix concatenate._CubeSignature sorted * minor tweaks * keep lenient_client private in maths * tidy maths * tidy iris.analysis.maths.IFunc * refactor IFunc test * polish in-place support * tidy metadata_resolve Co-authored-by: stephenworsley <[email protected]> * rebase master fix-up for cube arithmetic * add missing new dependency to readthedocs.yml requirements Co-authored-by: lbdreyer <[email protected]> Co-authored-by: Patrick Peglar <[email protected]> Co-authored-by: Martin Yeo <[email protected]> Co-authored-by: stephenworsley <[email protected]>
This PR introduces a new lenient client/service infrastructure and also extends the common metadata API.
This is the second PR that underpins issue #3478 and builds on PR #3583.
The lenient infrastructure offers a new workflow to
iris
that allows us (and users) to easily control the strictness of operations. This has been applied tometadata
equality, combination and difference operations in this PR, but in principle lenient operations could easily be extended to other areas ofiris
e.g.,merge
andconcatenate
etcThe common metadata API has been extended to also support equality, inequality, combination and difference operations in both strict and lenient modes.
The
iris.common.metadata
module also supports its own logging, which includes a root logger foriris
. Ideally, I'd like to overhaul the obscene use ofwarnings
within Iris with replace their use with a logging based infrastructure instead - but that's for another day, and certainly not in this PR.In addition, it is now possible to assign a partial
Mapping
, or metadata from anotheriris
instance to any other e.g., the followingjust works 😄
Required to
git cherry-pick
the following PRs frommaster
to resolve knowntravis-ci
test failure issues:TBD:
upstream/master
to fixtravis-ci
iris.common.metadata._NamedTupleMeta
changetoken
usage to explicit"_members"
iris.common.lenient.Lenient.register_client(..., append=False)
lenient.py
test coveragemetadata.py
test coverageBaseMetadata
combine
difference
equal
AncillaryVariableMetadata
CellMeasureMetadata
CoordMetadata
Cubemetadata
lenient/strict
debugLenient.context
ephemeral__context
client services append toactive
client services?Lenient
API veneer to_Lenient
Open questions:
Lenient.context
client services - clobber vs append?__eq__
,compare
anddifference
operationsdifference
result of equivalent namedtuple metadata instances -None
vs instance?Follow-up PRs:
@lenient_service(enable=<boolean>)
andLenient.register_service(<service, enable=<boolean>)
Lenient.__dict__
clients and services namespace separationMetadataManager.values
propertyiris
logging infrastructure