You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As demonstrated in #5348 , there are some problems with the existing handling of these attributes in netcdf data.
Broadly,
at present these are not treated specially but passed from (netcdf) input to (netcdf) output unchanged.
... which means they will generally be 'inherited' by cubes and coordinates (etc) which are in some way derived from the input ones, and applied to those when written out.
... which is a problem, because the content of such 'derived' things may have completely different meanings, units etc, and hence its valid value range
... hence the original 'valid range' may not be appropriate for the saved data.
... this can inappropriately affect the results when such data is read back in
(for context : 'derived' could mean regridding, cube arithmetic, statistics, units conversion or whatever.
In the #5348 testcase, the use of 'intersection' produces a modified longitude coordinate.
)
In fact the netcdf4-python library docs on this (e.g. here) don't make clear that it respects "valid_range" in this way, but actual practice and this code show that it definitely does, treating it as equivalent to valid_min/max (which it does document).
The effect is simply that, on read, points outside a valid range are masked.
Proposed changes
In short, I think it is arguable that these attributes are part of the low-level encoding and interpretation of netcdf variable data, and as such "ought" to be treated by Iris in the same way as scale_factor and add_offset.
What that might mean ...
on load, a valid-range causes a netcdf data-fetch to identify additional 'masked' points,
so logically, this is our internal Iris representation, i.e. as masked points, which we should rely on.
on save, Iris can't automatically determine a 'valid range' as a separate concept from the masked points, and so I think should never write these attributes -- except possibly by specific user request.
So I propose that:
on load, we should discard these attributes, exactly as we do for scale_factor, add_offset and _FillValue.
on save, we should not create these by default
Remaining queries
Feedback wanted on these
(1) user overrides
In order to support the "by specific request" idea, it would also be logical to add the valid_xxx as possible entries in the 'packing' keyword of the netcdf Saver.write method.
N.B. however, just as with scale_factor/add_offset, this naturally restricts the usage to cube data, and does not apply to coordinates, as originally raised in #5348 .
In the cause of treating them "just like scale_factor and offset", it would also seem logical to disallow these attributes in Iris attribute dictionaries.
However, I'm not sure that the rationale here is really the same as for scaling/offset ...
If you define scale_factor or add_offset for a variable -- typically along with a different dtype -- then that affects how the data is actually stored.
But AFAICT it would in these case also be "safe" to allow the user to explicitly set valid-range attributes in the attributes dictionary, and these would then just be written with the variable.
In that approach, we would not exclude them from attributes dictionaries, and we don't need to support them in 'packing' either. That certainly seems simpler and it is applicable to coords etc as well as cubes.
(2) compatibility control
The above schemes will not be fully backwards-compatible, but do seem like an improved standard behaviour for the future.
So we should probably consider introducing a FUTURE switch for it.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
As demonstrated in #5348 , there are some problems with the existing handling of these attributes in netcdf data.
Broadly,
(for context : 'derived' could mean regridding, cube arithmetic, statistics, units conversion or whatever.
In the #5348 testcase, the use of 'intersection' produces a modified longitude coordinate.
)
Some background
I already took a long look into the various groups of netcdf attributes which Iris handles in different ways, in the context of the ongoing project to improve handling of global+local attributes.
My previous summary in this comment
Behaviour of current code (v3.6.0)
The "valid_range" attribute, along with "valid_min" and "valid_max" ...
scale_factor
andadd_offset
are)netcdf4 behaviour
In fact the netcdf4-python library docs on this (e.g. here) don't make clear that it respects "valid_range" in this way, but actual practice and this code show that it definitely does, treating it as equivalent to valid_min/max (which it does document).
The effect is simply that, on read, points outside a valid range are masked.
Proposed changes
In short, I think it is arguable that these attributes are part of the low-level encoding and interpretation of netcdf variable data, and as such "ought" to be treated by Iris in the same way as scale_factor and add_offset.
What that might mean ...
so logically, this is our internal Iris representation, i.e. as masked points, which we should rely on.
So I propose that:
scale_factor
,add_offset
and_FillValue
.Remaining queries
Feedback wanted on these
(1) user overrides
In order to support the "by specific request" idea, it would also be logical to add the
valid_xxx
as possible entries in the 'packing' keyword of the netcdf Saver.write method.N.B. however, just as with scale_factor/add_offset, this naturally restricts the usage to cube data, and does not apply to coordinates, as originally raised in #5348 .
In the cause of treating them "just like
scale_factor
andoffset
", it would also seem logical to disallow these attributes in Iris attribute dictionaries.However, I'm not sure that the rationale here is really the same as for scaling/offset ...
If you define
scale_factor
oradd_offset
for a variable -- typically along with a differentdtype
-- then that affects how the data is actually stored.But AFAICT it would in these case also be "safe" to allow the user to explicitly set valid-range attributes in the attributes dictionary, and these would then just be written with the variable.
In that approach, we would not exclude them from attributes dictionaries, and we don't need to support them in 'packing' either. That certainly seems simpler and it is applicable to coords etc as well as cubes.
(2) compatibility control
The above schemes will not be fully backwards-compatible, but do seem like an improved standard behaviour for the future.
So we should probably consider introducing a FUTURE switch for it.
Beta Was this translation helpful? Give feedback.
All reactions