-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAS-1177: Add functions and CF overrides to create artificial bounds … #5
Changes from 1 commit
2244c71
02a6c37
71c39c1
fec7cf0
975861d
ca3bd74
7a872fd
6aa595a
b906d2a
e5f64d3
1086e14
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,18 @@ | ||
## v1.0.2 | ||
### 2024-2-26 | ||
|
||
This version of HOSS correctly handles edge-aligned geographic collections by | ||
adding the attribute `cell_alignment` with the value `edge` to `hoss_config.json` | ||
for edge-aligned collections (namely, ATL16), and by adding functions that | ||
create pseudo bounds for edge-aligned collections to make HOSS use the | ||
`dimension_utilities.py` function, `get_dimension_indices_from_bounds`. | ||
|
||
This change also includes an addition of a CF override that addresses an | ||
issue with the ATL16 metadata for the variables `/spolar_asr_obs_grid` and | ||
`/spolar_lorate_blowing_snow_freq` where their `grid_mapping` attribute points | ||
to north polar variables instead of south polar variables. This CF Override | ||
will have to be removed if/when the metadata is corrected. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: It's a bit of a strong statement to say it will have to be removed. It can be (and I think it would be good to do so), but it's unlikely that it would break anything by having an override that has the same information as the attribute it's overriding. |
||
|
||
## v1.0.1 | ||
### 2023-12-19 | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
1.0.1 | ||
1.0.2 |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -19,7 +19,7 @@ | |||||
from harmony.message import Message | ||||||
from harmony.message_utility import rgetattr | ||||||
from harmony.util import Config | ||||||
from varinfo import VarInfoFromDmr | ||||||
from varinfo import VarInfoFromDmr, VariableFromDmr | ||||||
|
||||||
from hoss.bbox_utilities import flatten_list | ||||||
from hoss.exceptions import InvalidNamedDimension, InvalidRequestedRange | ||||||
|
@@ -75,8 +75,143 @@ def prefetch_dimension_variables(opendap_url: str, varinfo: VarInfoFromDmr, | |||||
|
||||||
logger.info('Variables being retrieved in prefetch request: ' | ||||||
f'{format_variable_set_string(required_dimensions)}') | ||||||
return get_opendap_nc4(opendap_url, required_dimensions, output_dir, | ||||||
logger, access_token, config) | ||||||
|
||||||
required_dimensions_nc4 = get_opendap_nc4(opendap_url, | ||||||
required_dimensions, output_dir,logger, access_token, config) | ||||||
|
||||||
# Create bounds variables if necessary. | ||||||
add_bounds_variables(required_dimensions_nc4, required_dimensions, | ||||||
varinfo, logger) | ||||||
|
||||||
return required_dimensions_nc4 | ||||||
|
||||||
|
||||||
def add_bounds_variables(dimensions_nc4: str, | ||||||
required_dimensions: Set[str], | ||||||
varinfo: VarInfoFromDmr, | ||||||
logger: Logger) -> None: | ||||||
""" Augment a NetCDF4 file with artificial bounds variables for each | ||||||
dimension variable that is edge-aligned and does not already | ||||||
have bounds variables. | ||||||
|
||||||
For each dimension variable: | ||||||
(1) Check if the variable needs a bounds variable. | ||||||
(2) If so, create a bounds array of minimum and maximum values. | ||||||
(3) Then write the bounds variable to the NetCDF4 URL. | ||||||
|
||||||
""" | ||||||
owenlittlejohns marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
with Dataset(dimensions_nc4, 'r+') as datasets: | ||||||
for dimension_name in required_dimensions: | ||||||
dimension_variable = varinfo.get_variable(dimension_name) | ||||||
if needs_bounds(dimension_variable) is True: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is more pythonic to just test the returned value here:
Suggested change
ref: https://peps.python.org/pep-0008/#programming-recommendations There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for attaching the reference! I see that one there. |
||||||
min_and_max_bounds = create_bounds(datasets, dimension_name) | ||||||
owenlittlejohns marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
write_bounds(datasets, dimension_variable, min_and_max_bounds) | ||||||
|
||||||
logger.info('Artificial bounds added for dimension variable: ' | ||||||
f'{dimension_name}') | ||||||
|
||||||
|
||||||
def needs_bounds(dimension: VariableFromDmr) -> bool: | ||||||
owenlittlejohns marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
""" Check if a dimension variable needs a bounds variable. | ||||||
This will be the case when dimension cells are edge-aligned | ||||||
and bounds for that dimension do not already exist. | ||||||
|
||||||
""" | ||||||
|
||||||
return dimension.attributes['cell_alignment'] == 'edge' and dimension.references.get('bounds') == None | ||||||
owenlittlejohns marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
|
||||||
def create_bounds(dimension_dataset: Dataset, | ||||||
dimension_path: str) -> np.ndarray: | ||||||
""" Create an array containing the minimum and maximum bounds | ||||||
for a given dimension. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again - great level of description. If I was being a pedant (which I usually am), then it might be good to specifically say these are the minimum and maximum bounds for each pixel in a given dimension. |
||||||
|
||||||
The minimum and maximum values are determined under the assumption | ||||||
that the dimension data is monotonically increasing and contiguous. | ||||||
So for every bounds but the last, the bounds are simply extracted | ||||||
from the dimension dataset. | ||||||
|
||||||
The final bounds must be calculated with the assumption that | ||||||
the last data cell is edge-aligned and thus has a value the does | ||||||
not account for the cell length. So, the final bound is determined | ||||||
by taking the median of all the resolutions in the dataset to obtain | ||||||
a resolution that can be added to the final data value. | ||||||
|
||||||
Ex: Input dataset with resolution of 3 degrees: [ ... , 81, 84, 87] | ||||||
|
||||||
Minimum | Maximum | ||||||
<...> <...> | ||||||
81 84 | ||||||
84 87 | ||||||
87 ? -> 87 + median resolution -> 87 + 3 -> 90 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Brilliant example here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a dumb question about edge-alignment. Is it always the case that when something is edge aligned, the values are such that they represent the left side of the gridcell? or in the case of a vertical y-positive axis dimension, the bottom? I'm probably confusing this with something else, but I can't find the reference at all. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know that it's always the case, but I think that it's what we've seen. I tried to find a quick reference, but couldn't. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For ATL16, the origin is bottom left. That's why. Most products are top left |
||||||
|
||||||
""" | ||||||
# Access the dimension variable's data using the variable's full path. | ||||||
dimension_data = dimension_dataset[dimension_path][:] | ||||||
|
||||||
# Determine the dimension's resolution by taking the median value | ||||||
# of the differences between each ascending data value. | ||||||
dimension_array = np.array(dimension_data) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little confused - isn't There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You appear to be correct. Removing! |
||||||
|
||||||
# Build array. | ||||||
size = dimension_array.size | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This feels like another redundant variable - just use |
||||||
min_max_pairs = [[dimension_array[idx], | ||||||
dimension_array[idx+1]] | ||||||
for idx in range(0, size-1)] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It feels like you could avoid using lists at all and just go straight for a Something like:
The transpose fun-and-games is a bit of a faff, but I think it makes the initial assignment easier, but then you need to swap back to the expected shape of (N, 2), rather than (2, N). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I took a moment to review this and test it out, and it looks like a simple, good approach. Neat! |
||||||
|
||||||
# Calculate final values. | ||||||
dim_resolution = np.median(np.diff(dimension_array)) | ||||||
min_max_pairs.append([dimension_array[size-1], | ||||||
dimension_array[size-1] + dim_resolution]) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume this is tested, so I'm suspicious of this observation, but... On line 160 above the min_max_pairs are being populated for the range (0 .. size-1). And then, here, an additional value is being populated. Wouldn't that be one beyond what we want to populate? Also on line 159, referencing dimension_array[idx+1] for idx = (size-1) - wouldn't that be one beyond the range of dimension_array? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh right! python ranges only include up until the second value, but don't include that value itself. So the final So if the size of the dimension array is, say, 60, the loop will write pair values up to index 58, then the final value calculation adds the |
||||||
|
||||||
return np.array(min_max_pairs) | ||||||
|
||||||
|
||||||
def write_bounds(dimension_dataset: Dataset, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This probably sounds like a nit, but it's something that confused me reading this code over: Can we use a different variable name to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh that is more descriptive. Thank you! |
||||||
dimension_variable: VariableFromDmr, | ||||||
min_and_max_bounds: np.ndarray) -> None: | ||||||
""" Write the input bounds array to a given dimension dataset. | ||||||
|
||||||
First a new dimension is created for the new bounds variable | ||||||
to allow the variable to be two-dimensional. | ||||||
|
||||||
Then the new bounds variable is created using two dimensions: | ||||||
(1) the existing dimension of the dimension dataset, and | ||||||
(2) the new bounds variable dimension. | ||||||
|
||||||
""" | ||||||
# Create the second bounds dimension. | ||||||
dimension_full_name_path = dimension_variable.full_name_path | ||||||
owenlittlejohns marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
dimension_group = '/' + '/'.join(dimension_full_name_path.split('/')[1:-1]) | ||||||
dimension_name = dimension_full_name_path.split('/')[-1] | ||||||
|
||||||
# Consider the special case when the dimension group is the root directory. | ||||||
# The dimension can't refer to the full path in the name itself, so we have | ||||||
# to create it with respect to the group we want to place it in. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Totally just a) Nitpick: Maybe this comment could be truncated a tiny bit, and as it is specific to the first condition in the
|
||||||
if dimension_group == '/': | ||||||
bounds_dim = dimension_dataset.createDimension(dimension_name + '_bnds_dim', 2) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The standard naming convention is generally something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, because this string is used in a couple of places, you should probably declare it as a variable and then refer to that variable. (Just so that if the definition changes in the future, you only have to update one place) |
||||||
else: | ||||||
bounds_dim = dimension_dataset[dimension_group].createDimension(dimension_name + '_bnds_dim', 2) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It appears the There's gotta be a better way to do this, but I wrestled with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah - this is frustrating. I didn't find a quick alternative. Maybe there's a cleverer way to split the dimension basename off the group with sting manipulation above, but it seems like the logic you have here to do one thing for the root group, and another for a nested one is the right way to go. (It's also really frustrating that |
||||||
|
||||||
# Dimension variables only have one dimension - themselves. | ||||||
variable_dimension = dimension_dataset[dimension_full_name_path].dimensions[0] | ||||||
|
||||||
bounds_data_type = str(dimension_variable.data_type) | ||||||
bounds = dimension_dataset.createVariable(dimension_full_name_path + '_bnds', | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar comment about reusing the bounds name - you define it on L211 - could you move that declaration up and use it here, too. (Or the base name from that declaration, at least) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did I update this like you were imagining? Since the last two variable names were independent of each other I wasn't entirely sure. |
||||||
bounds_data_type, | ||||||
(variable_dimension, | ||||||
bounds_dim,)) | ||||||
# Write data to dataset file. | ||||||
size = len(min_and_max_bounds) | ||||||
for idx in range(0, size): | ||||||
bounds[idx, 0] = (min_and_max_bounds[idx])[0] | ||||||
bounds[idx, 1] = (min_and_max_bounds[idx])[1] | ||||||
owenlittlejohns marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
# Update varinfo attributes and references. | ||||||
bounds_name = dimension_name + '_bnds' | ||||||
dimension_dataset[dimension_full_name_path].setncatts({'bounds': bounds_name}) | ||||||
dimension_variable.references['bounds'] = {bounds_name,} | ||||||
dimension_variable.attributes['bounds'] = bounds_name | ||||||
owenlittlejohns marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
|
||||||
def is_dimension_ascending(dimension: MaskedArray) -> bool: | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -245,6 +245,20 @@ | |
], | ||
"_Description": "Ensure variables in /Soil_Moisture_Retrieval_Data_Polar_PM group point to correct coordinate variables." | ||
}, | ||
{ | ||
"Applicability": { | ||
"Mission": "ICESat2", | ||
"ShortNamePath": "ATL16", | ||
"Variable_Pattern": ".*_grid_(lat|lon)" | ||
}, | ||
"Attributes": [ | ||
{ | ||
"Name": "cell_alignment", | ||
"Value": "edge" | ||
} | ||
], | ||
"_Description": "ATL16 has edge-aligned grid cells." | ||
}, | ||
{ | ||
"Applicability": { | ||
"Mission": "ICESat2", | ||
|
@@ -357,6 +371,19 @@ | |
} | ||
], | ||
"_Description": "Ensure the latitude and longitude dimension variables know their associated grid_mapping variable." | ||
}, | ||
{ | ||
"Applicability": { | ||
"Mission": "ICESat2", | ||
"ShortNamePath": "ATL16", | ||
"Variable_Pattern": "/spolar_(asr_obs_grid|lorate_blowing_snow_freq)" | ||
}, | ||
"Attributes": [ | ||
{ | ||
"Name": "grid_mapping", | ||
"Value": "crs_latlon: spolar_grid_lat crs_latlon: spolar_grid_lon" | ||
} | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This entry addresses what we think is a mistake in the ATL16 v004 data for the two indicated variables, where the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the record, that grid_mapping value is referencing the crs_latlon grid-mapping variable for both horizontal dimensions (lat, lon), here explicitly identified in the grid_mapping reference value. A simpler entry would be, I think, to specify "crs_latlon" as a grid_mapping variable that implicitly applies to all dimensions. This posted override is in keeping with the other ATL16 grid_mapping reference attributes that explicitly call out the dimensional applicability. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Totally agree that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I almost wonder if HOSS should be a bit less "diligent" in reading those attributes and instead use other heuristics (likely already in use) for identifying the relevant horizontal dimensions for grid mapping references. They seem superfluous in this case, and then they got them wrong! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it HOSS reading those attributes, or varinfo? I saw that varinfo specifically checks the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point @lyonthefrog - it is |
||
} | ||
], | ||
"CF_Supplements": [ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that these are our public release notes, it might be worth explicitly stating in some way that the pseudo-bounds variables are not returned in the actual HOSS output.