-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAS-1177: Add functions and CF overrides to create artificial bounds … #5
Conversation
…for collections with edge-alignment.
"Name": "grid_mapping", | ||
"Value": "crs_latlon: spolar_grid_lat crs_latlon: spolar_grid_lon" | ||
} | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entry addresses what we think is a mistake in the ATL16 v004 data for the two indicated variables, where the grid_mapping
attribute references north polar dimensions instead of south polar dimensions. Without this CF override, all the subsets that include either of these variables would fail due to the existing bug captured in DAS-1804.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, that grid_mapping value is referencing the crs_latlon grid-mapping variable for both horizontal dimensions (lat, lon), here explicitly identified in the grid_mapping reference value. A simpler entry would be, I think, to specify "crs_latlon" as a grid_mapping variable that implicitly applies to all dimensions. This posted override is in keeping with the other ATL16 grid_mapping reference attributes that explicitly call out the dimensional applicability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree that crs_latlon
is simpler, but it seems best to match the format of the other entries for grid_mapping
that NSIDC is using (which is still a valid format per the CF Conventions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost wonder if HOSS should be a bit less "diligent" in reading those attributes and instead use other heuristics (likely already in use) for identifying the relevant horizontal dimensions for grid mapping references. They seem superfluous in this case, and then they got them wrong!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it HOSS reading those attributes, or varinfo? I saw that varinfo specifically checks the grid_mapping
attribute if one exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @lyonthefrog - it is VarInfoFromDmr
doing the reading. And I think we want to keep earthdata-varinfo
as aligned to the standard (CF Conventions) as possible. This format of representation is valid per the CF Conventions, so I think we're okay. The only intrinsic error here was that the metadata was incorrect in the file. (Quirky Data!)
hoss/dimension_utilities.py
Outdated
if dimension_group == '/': | ||
bounds_dim = dimension_dataset.createDimension(dimension_name + '_bnds_dim', 2) | ||
else: | ||
bounds_dim = dimension_dataset[dimension_group].createDimension(dimension_name + '_bnds_dim', 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears the createDimension
function in the Python netCDF4
library doesn't recognize full path names, so I placed the dimension in the right group via the group reference in front of the function. However, this doesn't appear to work when the dimension is in the root directory.
There's gotta be a better way to do this, but I wrestled with the netCDF4
for a while to no avail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - this is frustrating. I didn't find a quick alternative. Maybe there's a cleverer way to split the dimension basename off the group with sting manipulation above, but it seems like the logic you have here to do one thing for the root group, and another for a nested one is the right way to go. (It's also really frustrating that dimension_dataset['/']
doesn't work like other groups)
hoss/dimension_utilities.py
Outdated
# Calculate final values. | ||
dim_resolution = np.median(np.diff(dimension_array)) | ||
min_max_pairs.append([dimension_array[size-1], | ||
dimension_array[size-1] + dim_resolution]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is tested, so I'm suspicious of this observation, but... On line 160 above the min_max_pairs are being populated for the range (0 .. size-1). And then, here, an additional value is being populated. Wouldn't that be one beyond what we want to populate? Also on line 159, referencing dimension_array[idx+1] for idx = (size-1) - wouldn't that be one beyond the range of dimension_array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right! python ranges only include up until the second value, but don't include that value itself. So the final idx
value in the min_max_pairs
array loop will be size-2, since range(0, size-1)
.
So if the size of the dimension array is, say, 60, the loop will write pair values up to index 58, then the final value calculation adds the [dimension_array[60-1], dimension_array[60-1] + res]
which amounts to a final index of 59.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exciting stuff!
A lot of my comments are about clarifying things with variable naming and maybe using numpy
a bit more. (Also styling nits)
The other really big thing is the need for unit tests (and fixing the tests that are now broken). We're going to talk about those later today, so they can be included in this PR.
hoss/dimension_utilities.py
Outdated
if dimension_group == '/': | ||
bounds_dim = dimension_dataset.createDimension(dimension_name + '_bnds_dim', 2) | ||
else: | ||
bounds_dim = dimension_dataset[dimension_group].createDimension(dimension_name + '_bnds_dim', 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - this is frustrating. I didn't find a quick alternative. Maybe there's a cleverer way to split the dimension basename off the group with sting manipulation above, but it seems like the logic you have here to do one thing for the root group, and another for a nested one is the right way to go. (It's also really frustrating that dimension_dataset['/']
doesn't work like other groups)
hoss/dimension_utilities.py
Outdated
# The dimension can't refer to the full path in the name itself, so we have | ||
# to create it with respect to the group we want to place it in. | ||
if dimension_group == '/': | ||
bounds_dim = dimension_dataset.createDimension(dimension_name + '_bnds_dim', 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standard naming convention is generally something like latv
, lonv
, or nv
(for lat
, lon
and time
dimensions). Probably the best thing would be to call this dimension dimension_name + 'v'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, because this string is used in a couple of places, you should probably declare it as a variable and then refer to that variable. (Just so that if the definition changes in the future, you only have to update one place)
hoss/dimension_utilities.py
Outdated
variable_dimension = dimension_dataset[dimension_full_name_path].dimensions[0] | ||
|
||
bounds_data_type = str(dimension_variable.data_type) | ||
bounds = dimension_dataset.createVariable(dimension_full_name_path + '_bnds', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about reusing the bounds name - you define it on L211 - could you move that declaration up and use it here, too. (Or the base name from that declaration, at least)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I update this like you were imagining? Since the last two variable names were independent of each other I wasn't entirely sure.
…de with simpler/best practice code, and make a lot of minor formatting updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had just a couple of pythonic things and simplifications. And just a few questions in general. I am trusting everyone's judgement on the hoss config file.
hoss/dimension_utilities.py
Outdated
with Dataset(dimensions_nc4, 'r+') as prefetch_dataset: | ||
for dimension_name in required_dimensions: | ||
dimension_variable = varinfo.get_variable(dimension_name) | ||
if needs_bounds(dimension_variable) is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more pythonic to just test the returned value here:
if needs_bounds(dimension_variable) is True: | |
if needs_bounds(dimension_variable): |
ref: https://peps.python.org/pep-0008/#programming-recommendations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for attaching the reference! I see that one there.
<...> <...> | ||
81 84 | ||
84 87 | ||
87 ? -> 87 + median resolution -> 87 + 3 -> 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a dumb question about edge-alignment.
Is it always the case that when something is edge aligned, the values are such that they represent the left side of the gridcell? or in the case of a vertical y-positive axis dimension, the bottom?
I'm probably confusing this with something else, but I can't find the reference at all.
# Access the dimension variable's data using the variable's full path. | ||
dimension_array = prefetch_dataset[dimension_path][:] | ||
|
||
median_resolution = np.median(np.diff(dimension_array)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the dimension array have constant differences between cells?
i.e. Value2 - Value1 == Valuen+1 - ValueN for all N?
If so, this is probably doing more math then we need. we can just compute the difference between any two cells. If not, then this won't be precisely accurate, is this "good enough" for subsetting that we're going for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it always the case that when something is edge aligned, the values are such that they represent the left side of the gridcell?
No, I'm pretty sure any type of edge alignment is possible. We decided to stick with handling the lower left alignment case since that's ATL16's alignment and accounting for all the possible alignments could balloon very quickly. I think we're thinking/hoping that this is the most common edge-alignment too. I might have just made some of that up though, what say you @owenlittlejohns?
All that being said, this should probably be documented in the code, huh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the dimension array have constant differences between cells?
@autydp and @owenlittlejohns looked over this calculation with me, and they figured the cell differences might not always be constant. So yes, what I currently have here is a "good enough". They can weigh in though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might still argue that the (N-2, N-1) difference is a better predictor of the (N-1, N) difference than the entire median, but I will not push it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main thing here is that there can be a bit of floating-point level variation. Maybe that's not super important for the very last pixel, but it's bitten us before at times for requests with bounding boxes hitting the exact boundary between pixels. My preference for the median diff comes from a combination of that, and that it's still all a pretty straightforward one-liner to get it with numpy
.
hoss/dimension_utilities.py
Outdated
dimension_variable.full_name_path) | ||
|
||
# Create the second bounds dimension. | ||
dimension_group = '/' + '/'.join(dimension_variable.full_name_path.split('/')[1:-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just removing the last item behind a path like string?
e.g.
/group1/group2/varaibleb
full_name_path = '/group1/group2/varaibleb'
dimension_group = '/' + '/'.join(full_name_path.split('/')[1:-1])
print dimension_group
> '/group1/group2'
If so, I think you can simplify this to just
'/'.join(dimension_variable.full_name_path.split('/')[:-1])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless that full name path sometimes doesn't begin with a slash.
But in that case, you will get different results, but probably the right ones?
>>> full_name_path = 'group1/group2/group3'
>>> dimension_group = '/' + '/'.join(full_name_path.split('/')[1:-1])
>>> dimension_group
'/group2'
>>> '/'.join(full_name_path.split('/')[:-1])
'group1/group2'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrestled with this one for a bit when I wrote it. The root group is always '/', so I expect the full name path to always begin with a slash. In ATL16, all the variables are in the root group so a full name path like /npolar_grid_lat
would result in an empty string if a slash isn't added to the front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I might suggest looking at PurePosixPath from pathlib. (not only because we use it in datatree)
>>> from pathlib import PurePosixPath
>>> test_paths = ['/', '/root', '/root/group1', '/root/group1/group2']
>>> dimension_name = [PurePosixPath(x).name for x in test_paths]
>>> dimension_path = [PurePosixPath(x).parent for x in test_paths]
>>> dimension_name
['', 'root', 'group1', 'group2']
>>> dimension_path
[PurePosixPath('/'), PurePosixPath('/'), PurePosixPath('/root'), PurePosixPath('/root/group1')]
This might be overkill, but I would understand right away what was being done,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot better! Much more readable. I have to convert the output to a string since PurePosixPath returns a PurePosixPath object, but that's easy to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it too, Owen was too psyched. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit, that should have said, "Owen was not too psyched". That kinda changes it a bit. LOL
…acting names from dimension path.
tests/data/README.md
Outdated
* ATL16_prefetch.dmr | ||
- An example `.dmr` file retrieved from OPeNDAP for the ATL16 collection, but whittled | ||
down to only contain the six required dimension variables.<br><br> | ||
|
||
* ATL16_prefetch.nc4 | ||
- A sample output file that contains the six required dimension variables and one | ||
requested science variable in the ATL16 collection.<br><br> | ||
|
||
* ATL16_prefetch_group.dmr | ||
- An example `.dmr` file that is identical to the `ATL16_prefetch.dmr` file | ||
except for an additional fabricated nested group variable, whereas all the variables in the | ||
ATL16 collection are in the root directory.<br><br> | ||
|
||
* ATL16_prefetch_group.nc4 | ||
- A sample output file that is nearly identical to the `ATL16_prefetch.nc4` file except | ||
for an additional fabricated nested group variable (the same one in the | ||
ATL16_prefetch_group.dmr file).<br><br> | ||
|
||
* ATL16_prefetch_bnds.dmr | ||
- An example `.dmr` file that is identical to the `ATL16_prefetch.dmr` file | ||
except for four additional fabricated variables that represented the four | ||
possible cases of combining bounds variable existence and cell alignment.<br><br> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should any of these .dmr/.nc4 files be combined?
'user': 'jlovell', | ||
}) | ||
|
||
hoss = HossAdapter(message, config=config(False), catalog=self.input_stac) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick question: Does config=config(False)
just mean that we're not using a configuration file to create this? If so, is it necessary to have this since the default behavior assumes there's no configuration file if one isn't included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means it's not validated, but it will still try to fill default values. https://github.com/nasa/harmony-service-lib-py/blob/main/harmony/util.py#L143
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so the configuration can be drawn from the environment and not necessarily an input config file? (Or something along those lines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config
function returns a Harmony Config
object (so this is a different type of configuration from what we've previously been talking about - one with things like EDL information and staging bucket locations, etc).
The function tries to pull a bunch of environment variables from the Docker container. With the False
, the code then won't validate that it found all the required environment variables, or that they make sense. For local testing, where we are mocking things like interactions with AWS, we don't need/have values for all this stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh got it. I've mixed up these two config object types before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👨🍳 🤌 Nice!
(although, do you need the <br>'s
in here now? that's just markdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I messed around with it, but this was the only way I could get this specific spacing. Otherwise the file name would be on the next line after the previous file's description, and it's own description two lines away, which was kinda ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you just need 4 spaces to get the indented value. The github flavored markdown is a bit strange, but probably don't want html directly in the readme file. https://gist.github.com/flamingbear/f0a2d0df78dc6a7575865c14d3a27d76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might see what you meant if you were trying to get those extra space between each filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway this is a nit and you can leave it the way it is if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - if the <br>
could be removed, that would be nice, but not vital. It seems like it might be an issue with how your local IDE is rendering the markdown, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it - I am convinced now that it is indeed a me-problem with my VSCode markdown rendering.
@@ -225,7 +231,7 @@ def test_get_dimension_indices_from_indices(self): | |||
|
|||
def test_add_index_range(self): | |||
""" Ensure the correct combinations of index ranges are added as | |||
suffixes to the input variable based upon that variable's | |||
suffixes to the input variable based upon that variable'scd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suffixes to the input variable based upon that variable'scd | |
suffixes to the input variable based upon that variable's |
prefetch_dataset = Dataset('tests/data/ATL16_prefetch.nc4', 'r') | ||
dimension_path = '/npolar_grid_lat' | ||
|
||
bounds_array = np.array([[90.0, 89.0], [89.0, 88.0], [88.0, 87.0], [87.0, 86.0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bounds_array = np.array([[90.0, 89.0], [89.0, 88.0], [88.0, 87.0], [87.0, 86.0], | |
expected_bounds_array = np.array([[90.0, 89.0], [89.0, 88.0], [88.0, 87.0], [87.0, 86.0], |
and below where you use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with these changes I would change this, but also a nit. [you did while I was typing] I'll let Owen approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes already requested - it's looking good. Nice work with the tests, too!!!
I had a few nits - open to discussion on those. There were two main comments that I think are more than nits:
- Making sure that you know for sure that the index ranges in the DAP4 constraint expression for the end-to-end test:
- Are what you expect given the input bounding box and the ATL6 dimension data.
- Use a bounding box that would get wrong index values if HOSS thought the pixels were cell-centred.
- Extending the assertions of both sub tests in
test_write_bounds
to make sure you are confident it is doing everything it needs to, e.g.:- The bounds variable is where it should be
- The metadata attributes of the prefetch file have been updated (the
bounds
attribute on the dimension variable). - The
VarInfoFromDmr
has been updated (thereferences
on theVariableFromDmr
instance for the dimension variable).
hoss/dimension_utilities.py
Outdated
logger: Logger) -> None: | ||
""" Augment a NetCDF4 file with artificial bounds variables for each | ||
dimension variable that is edge-aligned and does not already | ||
have bounds variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total nitpick, but it might be worth adding to this first paragraph that the need for adding artificial bounds variables is identified by the earthdata-varinfo
configuration file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that I should say we're making these bounds by adding the CF override to hoss_config.json
, or something more directly in earthdata-varinfo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the latter - i.e., Augment a NetCDF4 file with artificial bounds variables for each dimension variable that is edge-aligned and does not already have bounds variables.
+ "edge-aligned" and "has-bounds" are defined by earthdata-varinfo attributes, which come from the netcdf4 variable attributes, or if necessary by overrides in the configuration file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might slightly disagree with some of the text above. edge-aligned
doesn't appear in any NetCDF-4 file, because it's something we've coined (and really wish was in the NetCDF-4 file). Also (yet another nit) the metadata attribute for bounds is just bounds
not has-bounds
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the cell_alignment
/edge-aligned
attribute only exists in hoss_config.json
, right? Does that make it an earthdata-varinfo
attribute? Like,
"Augment a NetCDF4 file with artificial bounds variables for each dimension variable that has been identified by hoss_config.json
to have an edge-aligned
attribute"?
Or instead of hoss_config.json
, we say "earthdata-varinfo
configuration file" as a catch-all for all applications?
(I may over-complicating this, I'm sorry!)
hoss/dimension_utilities.py
Outdated
(2) the new bounds variable dimension. | ||
|
||
""" | ||
# Create the bounds array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Feels like a superfluous comment.
hoss/dimension_utilities.py
Outdated
bounds_dimension_name = dimension_name + 'v' | ||
# Consider the special case when the dimension group is the root directory. | ||
# The dimension can't refer to the full path in the name itself, so we have | ||
# to create it with respect to the group we want to place it in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Totally just a) Nitpick: Maybe this comment could be truncated a tiny bit, and as it is specific to the first condition in the if
/else
block, it could probably go in that block itself:
if dimension_group == '/':
# This is the comment specifically about this block...
bounds_dim = ...
else:
bounds_dim = ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - if the <br>
could be removed, that would be nice, but not vital. It seems like it might be an issue with how your local IDE is rendering the markdown, maybe?
[74.0, 73.0], [73.0, 72.0], [72.0, 71.0], [71.0, 70.0], | ||
[70.0, 69.0], [69.0, 68.0], [68.0, 67.0], [67.0, 66.0], | ||
[66.0, 65.0], [65.0, 64.0], [64.0, 63.0], [63.0, 62.0], | ||
[62.0, 61.0], [61.0, 60.0]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I possibly just spotted this because it's a long array, but you reuse it between a couple of tests. Maybe it could go in the setUpClass
and you could reuse the same array for the assertions in test_write_bounds
and test_get_bounds_array
.
with self.subTest('Dimension variable is in the root group'): | ||
root_variable_full_path = '/npolar_grid_lat' | ||
root_varinfo_variable = varinfo_prefetch.get_variable(root_variable_full_path) | ||
root_variable_name = str(PurePosixPath(root_variable_full_path).name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opinion ahoy: This feels a little complicated. I think with tests that you know a string value up front it's sometimes better to just explicitly spell them out, rather than constructing them. (It makes the tests more readable later on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar in the nested group subtest.
CHANGELOG.md
Outdated
adding the attribute `cell_alignment` with the value `edge` to `hoss_config.json` | ||
for edge-aligned collections (namely, ATL16), and by adding functions that | ||
create pseudo bounds for edge-aligned collections to make HOSS use the | ||
`dimension_utilities.py` function, `get_dimension_indices_from_bounds`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that these are our public release notes, it might be worth explicitly stating in some way that the pseudo-bounds variables are not returned in the actual HOSS output.
CHANGELOG.md
Outdated
issue with the ATL16 metadata for the variables `/spolar_asr_obs_grid` and | ||
`/spolar_lorate_blowing_snow_freq` where their `grid_mapping` attribute points | ||
to north polar variables instead of south polar variables. This CF Override | ||
will have to be removed if/when the metadata is corrected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It's a bit of a strong statement to say it will have to be removed. It can be (and I think it would be good to do so), but it's unlikely that it would break anything by having an override that has the same information as the attribute it's overriding.
…end-to-end test, add tests to increase coverage in the unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is getting super close. I think the only place I'm still looking at is those two larger test comments:
- Just looking for confirmation that you verified the constraint expression is what you expected, and it's testing the edge-aligned case specifically.
- A couple of small tweaks to the
write_bounds
unit tests.
Thanks for all the other clean-up!
assert_array_equal(resulting_bounds_root_data, | ||
expected_bounds_data) | ||
# Check that bounds variable exists in the root group. | ||
self.assertTrue(prefetch_dataset.variables[root_variable_name]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nit, and you can ignore it, but it feels like maybe this assertion logically could go before you try and access the values in the bounds variables.
If the bounds variable doesn't exist, the code will likely throw some sort of exception on L430 (where you try and access the array values), which means you'll never actually run this assertion when it would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that logically. If the bounds variable doesn't exist, there's no point in doing the rest of the tests. Also, should this be:
# Check that bounds variable exists in the root group.
self.assertTrue(prefetch_dataset.variables[root_bounds_name])
with root_bounds_name
instead of root_variable_name
? Might've just caught my own mistake here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I have the right nested_bounds_variable
in the nested group test, so that must have been a copy/paste error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - I also hadn't spotted that, but I think you are right. Nice!
# (This is only the 'bounds' attribute, since other attributes | ||
# are not required.) | ||
root_bounds_varinfo_variable = varinfo_prefetch.get_variable( | ||
root_variable_full_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer, I might be getting mixed up here:
# L423
root_varinfo_variable = varinfo_prefetch.get_variable(
root_variable_full_path)
and
# L439
root_bounds_varinfo_variable = varinfo_prefetch.get_variable(
root_variable_full_path)
So aren't the next two assertions doing the same test?
Also root_bounds_varinfo_variable
kind of implies that you are getting the bounds
variable (e.g., /npolar_grid_lat_bnds
) rather than the dimension variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variable names are... very difficult. I can maybe try to reorganize them to make them make at least a little bit more sense.
Yea, those are doing the same thing and I did mean to have a root_bounds_varinfo_variable
. Wasn't this test for checking out the bounds
variable? I must be turned around. Looking below, I guess I thought #3 was what this test was doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one! I think you've addressed all my comments, (thanks for putting up with them all!)
Description
This PR updates adds the ability for HOSS to correctly handle edge-aligned geographic collections. This is done by creating an artificial bounds variable when one doesn't already exist so that the subset indices can be correctly calculated via the
get_dimension_indices_from_bounds
function indimension_utilities.py
. A new CF override attribute calledcell_alignment
was added to note collections that don't have center-aligned grid cells (where center-alignment is the default behavior).Four functions were added to perform this check, calculation, and variable creation:
This change also includes an addition of a CF override that addresses an
issue with the ATL16 metadata for the variables
/spolar_asr_obs_grid
and/spolar_lorate_blowing_snow_freq
where theirgrid_mapping
attribute pointsto north polar variables instead of south polar variables. This CF Override
will have to be removed if/when the metadata is corrected.
Jira Issue ID
DAS-1177
Local Test Steps
Execute the HOSS regression tests for ATL16 locally.
./bin/build-image && ./bin/build-test && ./bin/run-test
All tests should pass.
PR Acceptance Checklist
CHANGELOG.md
updated to include high level summary of PR changes.VERSION
updated if publishing a release.