Replies: 4 comments 6 replies
-
cc @Huite who's thought about this for unstructured grids. |
Beta Was this translation helpful? Give feedback.
-
I don't like option (2), which would allow for indexes without coordinate variables. This seems like it breaks an important invariant (all indexes are also coordinates). I think auxilliary dimensions on DataArray indexes/coordinates would be fine in principle, but we would an updated rule for deciding when to keep a dimension around versus when to drop it. The current rule is "only keep coordinates/indexes as long as their dimensions are also on the DataArray itself" I see at least three ways to do this:
Of these options, (2) and/or (3) are most appealing to me, because I doubt we can come up with new rules that would work well in every case. |
Beta Was this translation helpful? Give feedback.
-
Thanks @dcherian for the cc. I've been working on some selection for unstructured grids, via the UGRID conventions. From this view, I think the auxiliary dimensions feel the most straightforward. For a 2D unstructured triangular mesh topology: >>> da = ds["temp"]
>>> da.dims
{"mesh2d_n_face": 4}
>>> da.aux_dim
{"mesh2d_n_edge": 11, "mesh_n_node": 7, "mesh2d_n_node_per_face": 3, "two": 2}
>>> da
<xarray.DataArray "temp" (mesh2d_nface: 4)
...
Coordinates:
* mesh2d_face_node_connectivity (mesh2d_n_face, mesh2d_n_node_per_face) int64 ...
* mesh2d_node_x (mesh2d_n_node) float64 ...
* mesh2d_node_y (mesh2d_n_node) float64 ...
* mesh2d_edge_node_connectivity (mesh2d_n_edge, two) int64 ...
Dimensions: mesh2d_n_face
Auxiliary Dimensions: mesh2d_n_edge, mesh_n_node, mesh2d_n_node_per_face, two
Indexes:
┌ mesh2d_face_node_connectivity UgridIndex
│ mesh2d_node_x
│ mesh2d_node_y
└ mesh2d_edge_node_connectivity For this data, you would only be able to broadcast / reduce on the non-auxiliary dimension (no direct interaction as @benbovy mentions). The index, auxiliary dims and associated coords would be dropped with the non auxiliary-dim (mesh2d_n_face). It might happen that multiple auxiliary dimensions are required. E.g. adding layer and bounds: >>> da = ds["temp"]
>>> da.dims
{"layer": 3, "mesh2d_n_face": 4}
>>> da.aux_dim
{"mesh2d_n_edge": 11, "mesh_n_node": 7, "mesh2d_n_node_per_face": 3, "two": 2, "layer_n_bound": 2}
>>> da
<xarray.DataArray "temp" (layer: 3, mesh2d_nface: 4)
...
Coordinates:
* mesh2d_face_node_connectivity (mesh2d_n_face, mesh2d_n_node_per_face) int64 ...
* mesh2d_node_x (mesh2d_n_node) float64 ...
* mesh2d_node_y (mesh2d_n_node) float64 ...
* mesh2d_edge_node_connectivity (mesh2d_n_edge, two) int64 ...
* layer_bounds (layer, mesh2d_n_face, layer_n_bound) float64 ...
Dimensions: layer, mesh2d_n_face
Auxiliary Dimensions: mesh2d_n_edge, mesh_n_node, mesh2d_n_node_per_face, two, layer_n_bound
Indexes:
┌ mesh2d_face_node_connectivity UgridIndex
│ mesh2d_node_x
│ mesh2d_node_y
└ mesh2d_edge_node_connectivity
- layer_bounds IntervalIndex # or something For a dataset, a ds = ds.set_aux_dims({
"layer": ("layer_n_bound",),
"mesh2d_n_face": ("mesh2d_n_edge", "mesh_n_node", "mesh2d_n_node_per_face", "two"),
}) Or indeed an optional argument to the ds = ds.set_xindex(
coord_names=(
"mesh2d_face_node_connectivity",
"mesh2d_node_x",
"mesh2d_node_y",
"mesh2d_edge_node_connectivity",
),
aux_dims={
"mesh2d_n_face": ("mesh2d_n_edge", "mesh_n_node", "mesh2d_n_node_per_face", "two"),
},
) Then In the example above, both This seems like the most explicit, least magical way to me. It would cover the needs for dealing with UGRID unstructured grids, and I think it would work for bounds coordinates as well. What other types of coordinates require auxiliary dimensions? |
Beta Was this translation helpful? Give feedback.
-
Thanks for writing this up @benbovy ! My proposal was for (2) BUT with explicit propagation of all coordinate variables needed by the Indexes associated with the DataArray's dims : >>> da = ds["temp"]
>>> da
<xarray.DataArray "temp" (x_c: 9, y_c: 9)>
....
Coordinates:
* x_c (x_c) float64 ...
* x_g (x_g) float64 ...
* y_c (y_c) float64 ...
* y_g (y_g) float64 ...
Indexes:
┌ x_c GridIndex
│ x_g
│ y_c
└ y_g Some things I like about this are: I'd prefer the public message be (c) instead of this one (d). It seems a lot easier to communicate I think this proposal is basically what @benbovy is saying here:
|
Beta Was this translation helpful? Give feedback.
-
This has been discussed during the last Xarray community developers meeting. Briefly summarized, the problem is that multi-coordinate indexes may not be propagated properly in DataArray objects since the dimensions of a DataArray must correspond to the ones of the main array variable.
For example, let's consider this Dataset:
The
x_g
,y_g
andx_c
,y_c
dimension coordinates are respectively representing the left and center node positions of a staggered grid with two X, Y physical axes. They are all backed by aGridIndex
that allows grid-aware operations using Xarray's API directly. Thetemp
data variable represents a scalar field on the grid (center nodes).When handling the
temp
variable separately as a DataArray object we only keep thex_c
andy_c
dimensions of that variable, i.e., we loose the explicit relationship between the grid index and itsx_g
,y_g
coordinates.How to deal with this? Some ideas have been suggested at the meeting. Let me try to outline those (+ other) options below.
cc @dcherian @keewis @shoyer @TomNicholas
1. Drop the index
This is the easiest option but that's not convenient at all.
2. Keep the index
Propagate the index as-is.
Here the GridIndex is the same object than in the Dataset. It still contains all grid information (e.g., it could wrap a PandasIndex for each of the
x_g
,y_g
andx_c
,y_c
coordinates) but it just has two explicit coordinate references in the extracted DataArray. When using the index via the DataArray, the whole grid information is still used and maybe updated, e.g.,Other operations may not be that straightfoward, though. For example, converting back the DataArray to a Dataset may be ambiguous:
3. Separate (but related) indexes
E.g., for the example above have two separate indexes for the center and left node positions, respectively:
Where GridCenterIndex and GridLeftIndex would somehow point to each other. It is not very clear to me how this would work, though.
4. DataArray "auxiliary" dimensions
The DataArray data model would be augmented by the introduction of "auxiliary dimensions", i.e., all dimensions that are present in the DataArray coordinates but not in the main variable. This would allow propagating all index coordinates without touching the dimensions of the DataArray.
This would work very similarly to option 2, except that it is a bit more explicit (converting back to a Dataset would look less magical).
Auxiliary dimensions are not very useful, it is just some information that is propagated. Support would be also very limited, i.e., do not allow direct interaction with it (e.g., do not allow
da.isel(aux_dim=...)
).I haven't thought much how easy/hard would it be to implement this, though. Not sure what kind of technical difficulties we would encounter.
5. Coordinates "auxiliary" dimensions
Very similar to option 4 but addresses the problem at the level of Xarray
Coordinates
(once we refactor in Xarray both indexes and coordinate variables into a unique Coordinates container encapsulated in Dataset / DataArray).Beta Was this translation helpful? Give feedback.
All reactions