Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document use of new_axis to control merge #6180

Merged
merged 6 commits into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 198 additions & 0 deletions docs/src/further_topics/controlling_merge.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
.. _controlling_merge:

=================
Controlling Merge
pp-mo marked this conversation as resolved.
Show resolved Hide resolved
=================


Sometimes it is not possible to appropriately combine a CubeList using merge and concatenate on their own. In such cases
it is possible to achieve much more control over cube combination by using the :func:`~iris.util.new_axis` utility.
Consider the following set of cubes:

>>> file_1 = iris.sample_data_path("time_varying_hybrid_height", "*_001.pp")
>>> file_2 = iris.sample_data_path("time_varying_hybrid_height", "*_002.pp")
>>> cubes = iris.load([file_1, file_2], "x_wind")
>>> print(cubes[0])
x_wind / (m s-1) (model_level_number: 5; latitude: 144; longitude: 192)
Dimension coordinates:
model_level_number x - -
latitude - x -
longitude - - x
Auxiliary coordinates:
level_height x - -
sigma x - -
surface_altitude - x x
Derived coordinates:
altitude x x x
Scalar coordinates:
forecast_period 1338840.0 hours, bound=(1338480.0, 1339200.0) hours
forecast_reference_time 2006-01-01 00:00:00
time 2160-12-16 00:00:00, bound=(2160-12-01 00:00:00, 2161-01-01 00:00:00)
Cell methods:
0 time: mean (interval: 1 hour)
Attributes:
STASH m01s00i002
source 'Data from Met Office Unified Model'
um_version '12.1'
>>> print(cubes[1])
x_wind / (m s-1) (model_level_number: 5; latitude: 144; longitude: 192)
Dimension coordinates:
model_level_number x - -
latitude - x -
longitude - - x
Auxiliary coordinates:
level_height x - -
sigma x - -
surface_altitude - x x
Derived coordinates:
altitude x x x
Scalar coordinates:
forecast_period 1339560.0 hours, bound=(1339200.0, 1339920.0) hours
forecast_reference_time 2006-01-01 00:00:00
time 2161-01-16 00:00:00, bound=(2161-01-01 00:00:00, 2161-02-01 00:00:00)
Cell methods:
0 time: mean (interval: 1 hour)
Attributes:
STASH m01s00i002
source 'Data from Met Office Unified Model'
um_version '12.1'

These two cubes vary along the time dimension, however, we are not able to combine them with :meth:`~iris.cube.Cube.merge`
due to the fact that their ``surface_altitude`` coordinate also varies over time.
pp-mo marked this conversation as resolved.
Show resolved Hide resolved

>>> cubes.merge_cube()
Traceback (most recent call last):
...
iris.exceptions.MergeError: failed to merge into a single cube.
Coordinates in cube.aux_coords (non-scalar) differ: surface_altitude.

Since surface altitude is preventing merging, we want to find a way of combining these cubes while also *explicitly*
combining the ``surface_altitude`` coordinate so that it also varies along the time dimension. We can do this by first
adding a dimension to the cube *and* the ``surface_altitude`` coordinate using :func:`~iris.util.new_axis`, and then
concatenating those cubes together. We can attempt this as follows:

>>> from iris.util import new_axis
>>> from iris.cube import CubeList
>>> test_cubes = CubeList([new_axis(cube, scalar_coord="time", expand_extras=["surface_altitude"]) for cube in cubes])
>>> test_cubes.concatenate_cube()
Traceback (most recent call last):
...
iris.exceptions.ConcatenateError: failed to concatenate into a single cube.
Scalar coordinates values or metadata differ: forecast_period != forecast_period

This error alerts to us that the ``forecast_period`` coordinate is also varying over time. To get concatenation to work,
pp-mo marked this conversation as resolved.
Show resolved Hide resolved
we will have to expand the dimension of this coordinate also by passing it to the ``expand_extras`` keyword.
pp-mo marked this conversation as resolved.
Show resolved Hide resolved

>>> processed_cubes = CubeList(
pp-mo marked this conversation as resolved.
Show resolved Hide resolved
... [new_axis(cube, scalar_coord="time", expand_extras=["surface_altitude", "forecast_period"]) for cube in cubes]
... )
>>> result = processed_cubes.concatenate_cube()
>>> print(result)
x_wind / (m s-1) (time: 2; model_level_number: 5; latitude: 144; longitude: 192)
Dimension coordinates:
time x - - -
model_level_number - x - -
latitude - - x -
longitude - - - x
Auxiliary coordinates:
forecast_period x - - -
surface_altitude x - x x
level_height - x - -
sigma - x - -
Derived coordinates:
altitude x x x x
Scalar coordinates:
forecast_reference_time 2006-01-01 00:00:00
Cell methods:
0 time: mean (interval: 1 hour)
Attributes:
STASH m01s00i002
source 'Data from Met Office Unified Model'
um_version '12.1'

Note that since the derived coordinate ``altitude`` derives from ``surface_altitude``, in the combined cube, both of
these coordinates vary along the time dimension.

Controlling over multiple dimensions
------------------------------------

We now consider a more complex case. Instead of loading 2 files across different time steps we now load 15 such files.
Each of these files covers a month's time step, however, the ``surface_altitude`` coordinate changes every year. The
pp-mo marked this conversation as resolved.
Show resolved Hide resolved
files span 3 years so there are 3 different "surface_altitude" coordinates.

>>> filename = iris.sample_data_path('time_varying_hybrid_height', '*.pp')
>>> cubes = iris.load(filename, constraints="x_wind")
>>> print(cubes)
0: x_wind / (m s-1) (time: 2; model_level_number: 5; latitude: 144; longitude: 192)
1: x_wind / (m s-1) (time: 12; model_level_number: 5; latitude: 144; longitude: 192)
2: x_wind / (m s-1) (model_level_number: 5; latitude: 144; longitude: 192)

When :func:`iris.load` attempts to merge these cubes, it creates a cube for every unique ``surface_altitude`` coordinate.
Note that since there is only one time point associated with the last cube, the "time" coordinate has not been promoted
to a dimension. The ``surface_altitude`` in each of the above cubes is 2D, however, since some of these coordinates
already have a time dimension, it is not possible to use :func:`~iris.util.new_axis` as above to promote
``surface_altitude`` as we have done above. In order to fully control the merge process we instead use
:func:`iris.load_raw`:
pp-mo marked this conversation as resolved.
Show resolved Hide resolved

>>> raw_cubes = iris.load_raw(filename, constraints="x_wind")
>>> print(raw_cubes)
0: x_wind / (m s-1) (latitude: 144; longitude: 192)
1: x_wind / (m s-1) (latitude: 144; longitude: 192)
...
73: x_wind / (m s-1) (latitude: 144; longitude: 192)
74: x_wind / (m s-1) (latitude: 144; longitude: 192)

The raw cubes also separate cubes along the ``model_level_number`` dimension. In this instance, we will need to
merge/concatenate along two different dimensions. Specifically, we can merge along the ``model_level_number`` dimension
since ``surface_altitude`` does not vary along this dimension, and we can concatenate along the ``time`` dimension as
before. We expand the ``time`` dimension first, as before:

>>> processed_raw_cubes = CubeList(
... [new_axis(cube, scalar_coord="time", expand_extras=["surface_altitude", "forecast_period"]) for cube in raw_cubes]
... )
>>> print(processed_raw_cubes)
0: x_wind / (m s-1) (time: 1; latitude: 144; longitude: 192)
1: x_wind / (m s-1) (time: 1; latitude: 144; longitude: 192)
...
73: x_wind / (m s-1) (time: 1; latitude: 144; longitude: 192)
74: x_wind / (m s-1) (time: 1; latitude: 144; longitude: 192)

Then we merge along the ``model_level_number`` dimension. Note that merging these cubes does not affect the ``time``
dimension since merging only applies along scalar coordinates, not dimension coordinates of length 1.
pp-mo marked this conversation as resolved.
Show resolved Hide resolved

>>> merged_cubes = processed_raw_cubes.merge()
>>> print(merged_cubes)
0: x_wind / (m s-1) (model_level_number: 5; time: 1; latitude: 144; longitude: 192)
1: x_wind / (m s-1) (model_level_number: 5; time: 1; latitude: 144; longitude: 192)
...
13: x_wind / (m s-1) (model_level_number: 5; time: 1; latitude: 144; longitude: 192)
14: x_wind / (m s-1) (model_level_number: 5; time: 1; latitude: 144; longitude: 192)

Once merged, we can now concatenate these cubes together:
pp-mo marked this conversation as resolved.
Show resolved Hide resolved

>>> result = merged_cubes.concatenate_cube()
>>> print(result)
x_wind / (m s-1) (model_level_number: 5; time: 15; latitude: 144; longitude: 192)
Dimension coordinates:
model_level_number x - - -
time - x - -
latitude - - x -
longitude - - - x
Auxiliary coordinates:
level_height x - - -
sigma x - - -
forecast_period - x - -
surface_altitude - x x x
Derived coordinates:
altitude x x x x
Scalar coordinates:
forecast_reference_time 2006-01-01 00:00:00
Cell methods:
0 time: mean (interval: 1 hour)
Attributes:
STASH m01s00i002
source 'Data from Met Office Unified Model'
um_version '12.1'

.. todo::
Mention the work done in #6168
3 changes: 2 additions & 1 deletion docs/src/further_topics/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ Extra information on specific technical issues.
netcdf_io
dask_best_practices/index
ugrid/index
which_regridder_to_use
which_regridder_to_use
controlling_merge
Loading