Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAS-1177: Add functions and CF overrides to create artificial bounds … #5

Merged
merged 11 commits into from
Mar 14, 2024

Conversation

lyonthefrog
Copy link
Collaborator

@lyonthefrog lyonthefrog commented Feb 27, 2024

Description

This PR updates adds the ability for HOSS to correctly handle edge-aligned geographic collections. This is done by creating an artificial bounds variable when one doesn't already exist so that the subset indices can be correctly calculated via the get_dimension_indices_from_bounds function in dimension_utilities.py. A new CF override attribute called cell_alignment was added to note collections that don't have center-aligned grid cells (where center-alignment is the default behavior).

Four functions were added to perform this check, calculation, and variable creation:

  • add_bounds_variables
  • needs_bounds
  • create_bounds
  • write_bounds

This change also includes an addition of a CF override that addresses an
issue with the ATL16 metadata for the variables /spolar_asr_obs_grid and
/spolar_lorate_blowing_snow_freq where their grid_mapping attribute points
to north polar variables instead of south polar variables. This CF Override
will have to be removed if/when the metadata is corrected.

Jira Issue ID

DAS-1177

Local Test Steps

Execute the HOSS regression tests for ATL16 locally.

  • Open Docker Desktop.
  • Pull the latest version of this repository and checkout the DAS-1177 branch.
  • In the root directory, run the unit tests via:

./bin/build-image && ./bin/build-test && ./bin/run-test

All tests should pass.

PR Acceptance Checklist

  • Jira ticket acceptance criteria met.
  • CHANGELOG.md updated to include high level summary of PR changes.
  • VERSION updated if publishing a release.
  • Tests added/updated and passing.
  • Documentation updated (if needed).

"Name": "grid_mapping",
"Value": "crs_latlon: spolar_grid_lat crs_latlon: spolar_grid_lon"
}
]
Copy link
Collaborator Author

@lyonthefrog lyonthefrog Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entry addresses what we think is a mistake in the ATL16 v004 data for the two indicated variables, where the grid_mapping attribute references north polar dimensions instead of south polar dimensions. Without this CF override, all the subsets that include either of these variables would fail due to the existing bug captured in DAS-1804.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, that grid_mapping value is referencing the crs_latlon grid-mapping variable for both horizontal dimensions (lat, lon), here explicitly identified in the grid_mapping reference value. A simpler entry would be, I think, to specify "crs_latlon" as a grid_mapping variable that implicitly applies to all dimensions. This posted override is in keeping with the other ATL16 grid_mapping reference attributes that explicitly call out the dimensional applicability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree that crs_latlon is simpler, but it seems best to match the format of the other entries for grid_mapping that NSIDC is using (which is still a valid format per the CF Conventions).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost wonder if HOSS should be a bit less "diligent" in reading those attributes and instead use other heuristics (likely already in use) for identifying the relevant horizontal dimensions for grid mapping references. They seem superfluous in this case, and then they got them wrong!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it HOSS reading those attributes, or varinfo? I saw that varinfo specifically checks the grid_mapping attribute if one exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @lyonthefrog - it is VarInfoFromDmr doing the reading. And I think we want to keep earthdata-varinfo as aligned to the standard (CF Conventions) as possible. This format of representation is valid per the CF Conventions, so I think we're okay. The only intrinsic error here was that the metadata was incorrect in the file. (Quirky Data!)

if dimension_group == '/':
bounds_dim = dimension_dataset.createDimension(dimension_name + '_bnds_dim', 2)
else:
bounds_dim = dimension_dataset[dimension_group].createDimension(dimension_name + '_bnds_dim', 2)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the createDimension function in the Python netCDF4 library doesn't recognize full path names, so I placed the dimension in the right group via the group reference in front of the function. However, this doesn't appear to work when the dimension is in the root directory.

There's gotta be a better way to do this, but I wrestled with the netCDF4 for a while to no avail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - this is frustrating. I didn't find a quick alternative. Maybe there's a cleverer way to split the dimension basename off the group with sting manipulation above, but it seems like the logic you have here to do one thing for the root group, and another for a nested one is the right way to go. (It's also really frustrating that dimension_dataset['/'] doesn't work like other groups)

# Calculate final values.
dim_resolution = np.median(np.diff(dimension_array))
min_max_pairs.append([dimension_array[size-1],
dimension_array[size-1] + dim_resolution])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is tested, so I'm suspicious of this observation, but... On line 160 above the min_max_pairs are being populated for the range (0 .. size-1). And then, here, an additional value is being populated. Wouldn't that be one beyond what we want to populate? Also on line 159, referencing dimension_array[idx+1] for idx = (size-1) - wouldn't that be one beyond the range of dimension_array?

Copy link
Collaborator Author

@lyonthefrog lyonthefrog Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right! python ranges only include up until the second value, but don't include that value itself. So the final idx value in the min_max_pairs array loop will be size-2, since range(0, size-1).

So if the size of the dimension array is, say, 60, the loop will write pair values up to index 58, then the final value calculation adds the [dimension_array[60-1], dimension_array[60-1] + res] which amounts to a final index of 59.

Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exciting stuff!

A lot of my comments are about clarifying things with variable naming and maybe using numpy a bit more. (Also styling nits)

The other really big thing is the need for unit tests (and fixing the tests that are now broken). We're going to talk about those later today, so they can be included in this PR.

hoss/dimension_utilities.py Outdated Show resolved Hide resolved
hoss/dimension_utilities.py Show resolved Hide resolved
hoss/dimension_utilities.py Outdated Show resolved Hide resolved
hoss/dimension_utilities.py Show resolved Hide resolved
hoss/dimension_utilities.py Outdated Show resolved Hide resolved
if dimension_group == '/':
bounds_dim = dimension_dataset.createDimension(dimension_name + '_bnds_dim', 2)
else:
bounds_dim = dimension_dataset[dimension_group].createDimension(dimension_name + '_bnds_dim', 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - this is frustrating. I didn't find a quick alternative. Maybe there's a cleverer way to split the dimension basename off the group with sting manipulation above, but it seems like the logic you have here to do one thing for the root group, and another for a nested one is the right way to go. (It's also really frustrating that dimension_dataset['/'] doesn't work like other groups)

hoss/dimension_utilities.py Show resolved Hide resolved
hoss/dimension_utilities.py Outdated Show resolved Hide resolved
# The dimension can't refer to the full path in the name itself, so we have
# to create it with respect to the group we want to place it in.
if dimension_group == '/':
bounds_dim = dimension_dataset.createDimension(dimension_name + '_bnds_dim', 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard naming convention is generally something like latv, lonv, or nv (for lat, lon and time dimensions). Probably the best thing would be to call this dimension dimension_name + 'v'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, because this string is used in a couple of places, you should probably declare it as a variable and then refer to that variable. (Just so that if the definition changes in the future, you only have to update one place)

variable_dimension = dimension_dataset[dimension_full_name_path].dimensions[0]

bounds_data_type = str(dimension_variable.data_type)
bounds = dimension_dataset.createVariable(dimension_full_name_path + '_bnds',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment about reusing the bounds name - you define it on L211 - could you move that declaration up and use it here, too. (Or the base name from that declaration, at least)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I update this like you were imagining? Since the last two variable names were independent of each other I wasn't entirely sure.

…de with simpler/best practice code, and make a lot of minor formatting updates.
Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had just a couple of pythonic things and simplifications. And just a few questions in general. I am trusting everyone's judgement on the hoss config file.

with Dataset(dimensions_nc4, 'r+') as prefetch_dataset:
for dimension_name in required_dimensions:
dimension_variable = varinfo.get_variable(dimension_name)
if needs_bounds(dimension_variable) is True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more pythonic to just test the returned value here:

Suggested change
if needs_bounds(dimension_variable) is True:
if needs_bounds(dimension_variable):

ref: https://peps.python.org/pep-0008/#programming-recommendations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for attaching the reference! I see that one there.

<...> <...>
81 84
84 87
87 ? -> 87 + median resolution -> 87 + 3 -> 90
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a dumb question about edge-alignment.

Is it always the case that when something is edge aligned, the values are such that they represent the left side of the gridcell? or in the case of a vertical y-positive axis dimension, the bottom?

I'm probably confusing this with something else, but I can't find the reference at all.

# Access the dimension variable's data using the variable's full path.
dimension_array = prefetch_dataset[dimension_path][:]

median_resolution = np.median(np.diff(dimension_array))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the dimension array have constant differences between cells?
i.e. Value2 - Value1 == Valuen+1 - ValueN for all N?

If so, this is probably doing more math then we need. we can just compute the difference between any two cells. If not, then this won't be precisely accurate, is this "good enough" for subsetting that we're going for?

Copy link
Collaborator Author

@lyonthefrog lyonthefrog Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always the case that when something is edge aligned, the values are such that they represent the left side of the gridcell?

No, I'm pretty sure any type of edge alignment is possible. We decided to stick with handling the lower left alignment case since that's ATL16's alignment and accounting for all the possible alignments could balloon very quickly. I think we're thinking/hoping that this is the most common edge-alignment too. I might have just made some of that up though, what say you @owenlittlejohns?

All that being said, this should probably be documented in the code, huh.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the dimension array have constant differences between cells?

@autydp and @owenlittlejohns looked over this calculation with me, and they figured the cell differences might not always be constant. So yes, what I currently have here is a "good enough". They can weigh in though!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might still argue that the (N-2, N-1) difference is a better predictor of the (N-1, N) difference than the entire median, but I will not push it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main thing here is that there can be a bit of floating-point level variation. Maybe that's not super important for the very last pixel, but it's bitten us before at times for requests with bounding boxes hitting the exact boundary between pixels. My preference for the median diff comes from a combination of that, and that it's still all a pretty straightforward one-liner to get it with numpy.

dimension_variable.full_name_path)

# Create the second bounds dimension.
dimension_group = '/' + '/'.join(dimension_variable.full_name_path.split('/')[1:-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just removing the last item behind a path like string?

e.g.
/group1/group2/varaibleb

full_name_path = '/group1/group2/varaibleb'

dimension_group = '/' + '/'.join(full_name_path.split('/')[1:-1])
print dimension_group
> '/group1/group2'

If so, I think you can simplify this to just

'/'.join(dimension_variable.full_name_path.split('/')[:-1])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless that full name path sometimes doesn't begin with a slash.

But in that case, you will get different results, but probably the right ones?

>>> full_name_path = 'group1/group2/group3'
>>> dimension_group = '/' + '/'.join(full_name_path.split('/')[1:-1]) 
>>> dimension_group
'/group2'
>>> '/'.join(full_name_path.split('/')[:-1])
'group1/group2'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrestled with this one for a bit when I wrote it. The root group is always '/', so I expect the full name path to always begin with a slash. In ATL16, all the variables are in the root group so a full name path like /npolar_grid_lat would result in an empty string if a slash isn't added to the front.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I might suggest looking at PurePosixPath from pathlib. (not only because we use it in datatree)

>>> from pathlib import PurePosixPath 
>>> test_paths = ['/', '/root', '/root/group1', '/root/group1/group2']
>>> dimension_name = [PurePosixPath(x).name for x in test_paths]
>>> dimension_path = [PurePosixPath(x).parent for x in test_paths]
>>> dimension_name
['', 'root', 'group1', 'group2']
>>> dimension_path
[PurePosixPath('/'), PurePosixPath('/'), PurePosixPath('/root'), PurePosixPath('/root/group1')]

This might be overkill, but I would understand right away what was being done,.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot better! Much more readable. I have to convert the output to a string since PurePosixPath returns a PurePosixPath object, but that's easy to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it too, Owen was too psyched. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit, that should have said, "Owen was not too psyched". That kinda changes it a bit. LOL

hoss/dimension_utilities.py Outdated Show resolved Hide resolved
Comment on lines 97 to 118
* ATL16_prefetch.dmr
- An example `.dmr` file retrieved from OPeNDAP for the ATL16 collection, but whittled
down to only contain the six required dimension variables.<br><br>

* ATL16_prefetch.nc4
- A sample output file that contains the six required dimension variables and one
requested science variable in the ATL16 collection.<br><br>

* ATL16_prefetch_group.dmr
- An example `.dmr` file that is identical to the `ATL16_prefetch.dmr` file
except for an additional fabricated nested group variable, whereas all the variables in the
ATL16 collection are in the root directory.<br><br>

* ATL16_prefetch_group.nc4
- A sample output file that is nearly identical to the `ATL16_prefetch.nc4` file except
for an additional fabricated nested group variable (the same one in the
ATL16_prefetch_group.dmr file).<br><br>

* ATL16_prefetch_bnds.dmr
- An example `.dmr` file that is identical to the `ATL16_prefetch.dmr` file
except for four additional fabricated variables that represented the four
possible cases of combining bounds variable existence and cell alignment.<br><br>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should any of these .dmr/.nc4 files be combined?

tests/test_adapter.py Outdated Show resolved Hide resolved
'user': 'jlovell',
})

hoss = HossAdapter(message, config=config(False), catalog=self.input_stac)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick question: Does config=config(False) just mean that we're not using a configuration file to create this? If so, is it necessary to have this since the default behavior assumes there's no configuration file if one isn't included?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means it's not validated, but it will still try to fill default values. https://github.com/nasa/harmony-service-lib-py/blob/main/harmony/util.py#L143

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so the configuration can be drawn from the environment and not necessarily an input config file? (Or something along those lines)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config function returns a Harmony Config object (so this is a different type of configuration from what we've previously been talking about - one with things like EDL information and staging bucket locations, etc).

The function tries to pull a bunch of environment variables from the Docker container. With the False, the code then won't validate that it found all the required environment variables, or that they make sense. For local testing, where we are mocking things like interactions with AWS, we don't need/have values for all this stuff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh got it. I've mixed up these two config object types before.

Copy link
Member

@flamingbear flamingbear Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👨‍🍳 🤌 Nice!

(although, do you need the <br>'s in here now? that's just markdown.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I messed around with it, but this was the only way I could get this specific spacing. Otherwise the file name would be on the next line after the previous file's description, and it's own description two lines away, which was kinda ugly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you just need 4 spaces to get the indented value. The github flavored markdown is a bit strange, but probably don't want html directly in the readme file. https://gist.github.com/flamingbear/f0a2d0df78dc6a7575865c14d3a27d76

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might see what you meant if you were trying to get those extra space between each filename.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway this is a nit and you can leave it the way it is if you want

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - if the <br> could be removed, that would be nice, but not vital. It seems like it might be an issue with how your local IDE is rendering the markdown, maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think maybe my VSCode is just rendering this differently. Here's what I get when I put this in my README.md:

image

I appreciate nitpicks because I want to write good and pretty code! I might just leave this one as is, just in case me overthinking markdown takes more time than it's worth.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it - I am convinced now that it is indeed a me-problem with my VSCode markdown rendering.

@@ -225,7 +231,7 @@ def test_get_dimension_indices_from_indices(self):

def test_add_index_range(self):
""" Ensure the correct combinations of index ranges are added as
suffixes to the input variable based upon that variable's
suffixes to the input variable based upon that variable'scd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
suffixes to the input variable based upon that variable'scd
suffixes to the input variable based upon that variable's

prefetch_dataset = Dataset('tests/data/ATL16_prefetch.nc4', 'r')
dimension_path = '/npolar_grid_lat'

bounds_array = np.array([[90.0, 89.0], [89.0, 88.0], [88.0, 87.0], [87.0, 86.0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bounds_array = np.array([[90.0, 89.0], [89.0, 88.0], [88.0, 87.0], [87.0, 86.0],
expected_bounds_array = np.array([[90.0, 89.0], [89.0, 88.0], [88.0, 87.0], [87.0, 86.0],

and below where you use it

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with these changes I would change this, but also a nit. [you did while I was typing] I'll let Owen approve.

Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes already requested - it's looking good. Nice work with the tests, too!!!

I had a few nits - open to discussion on those. There were two main comments that I think are more than nits:

  • Making sure that you know for sure that the index ranges in the DAP4 constraint expression for the end-to-end test:
    • Are what you expect given the input bounding box and the ATL6 dimension data.
    • Use a bounding box that would get wrong index values if HOSS thought the pixels were cell-centred.
  • Extending the assertions of both sub tests in test_write_bounds to make sure you are confident it is doing everything it needs to, e.g.:
    • The bounds variable is where it should be
    • The metadata attributes of the prefetch file have been updated (the bounds attribute on the dimension variable).
    • The VarInfoFromDmr has been updated (the references on the VariableFromDmr instance for the dimension variable).

logger: Logger) -> None:
""" Augment a NetCDF4 file with artificial bounds variables for each
dimension variable that is edge-aligned and does not already
have bounds variables.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total nitpick, but it might be worth adding to this first paragraph that the need for adding artificial bounds variables is identified by the earthdata-varinfo configuration file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I should say we're making these bounds by adding the CF override to hoss_config.json, or something more directly in earthdata-varinfo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latter - i.e., Augment a NetCDF4 file with artificial bounds variables for each dimension variable that is edge-aligned and does not already have bounds variables. + "edge-aligned" and "has-bounds" are defined by earthdata-varinfo attributes, which come from the netcdf4 variable attributes, or if necessary by overrides in the configuration file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might slightly disagree with some of the text above. edge-aligned doesn't appear in any NetCDF-4 file, because it's something we've coined (and really wish was in the NetCDF-4 file). Also (yet another nit) the metadata attribute for bounds is just bounds not has-bounds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the cell_alignment/edge-aligned attribute only exists in hoss_config.json, right? Does that make it an earthdata-varinfo attribute? Like,

"Augment a NetCDF4 file with artificial bounds variables for each dimension variable that has been identified by hoss_config.json to have an edge-aligned attribute"?

Or instead of hoss_config.json, we say "earthdata-varinfo configuration file" as a catch-all for all applications?

(I may over-complicating this, I'm sorry!)

(2) the new bounds variable dimension.

"""
# Create the bounds array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Feels like a superfluous comment.

bounds_dimension_name = dimension_name + 'v'
# Consider the special case when the dimension group is the root directory.
# The dimension can't refer to the full path in the name itself, so we have
# to create it with respect to the group we want to place it in.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Totally just a) Nitpick: Maybe this comment could be truncated a tiny bit, and as it is specific to the first condition in the if/else block, it could probably go in that block itself:

if dimension_group == '/':
    # This is the comment specifically about this block...
    bounds_dim = ...
else:
    bounds_dim = ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - if the <br> could be removed, that would be nice, but not vital. It seems like it might be an issue with how your local IDE is rendering the markdown, maybe?

tests/test_adapter.py Outdated Show resolved Hide resolved
[74.0, 73.0], [73.0, 72.0], [72.0, 71.0], [71.0, 70.0],
[70.0, 69.0], [69.0, 68.0], [68.0, 67.0], [67.0, 66.0],
[66.0, 65.0], [65.0, 64.0], [64.0, 63.0], [63.0, 62.0],
[62.0, 61.0], [61.0, 60.0]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I possibly just spotted this because it's a long array, but you reuse it between a couple of tests. Maybe it could go in the setUpClass and you could reuse the same array for the assertions in test_write_bounds and test_get_bounds_array.

with self.subTest('Dimension variable is in the root group'):
root_variable_full_path = '/npolar_grid_lat'
root_varinfo_variable = varinfo_prefetch.get_variable(root_variable_full_path)
root_variable_name = str(PurePosixPath(root_variable_full_path).name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinion ahoy: This feels a little complicated. I think with tests that you know a string value up front it's sometimes better to just explicitly spell them out, rather than constructing them. (It makes the tests more readable later on)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar in the nested group subtest.

tests/unit/test_dimension_utilities.py Show resolved Hide resolved
CHANGELOG.md Outdated
adding the attribute `cell_alignment` with the value `edge` to `hoss_config.json`
for edge-aligned collections (namely, ATL16), and by adding functions that
create pseudo bounds for edge-aligned collections to make HOSS use the
`dimension_utilities.py` function, `get_dimension_indices_from_bounds`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that these are our public release notes, it might be worth explicitly stating in some way that the pseudo-bounds variables are not returned in the actual HOSS output.

CHANGELOG.md Outdated
issue with the ATL16 metadata for the variables `/spolar_asr_obs_grid` and
`/spolar_lorate_blowing_snow_freq` where their `grid_mapping` attribute points
to north polar variables instead of south polar variables. This CF Override
will have to be removed if/when the metadata is corrected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It's a bit of a strong statement to say it will have to be removed. It can be (and I think it would be good to do so), but it's unlikely that it would break anything by having an override that has the same information as the attribute it's overriding.

…end-to-end test, add tests to increase coverage in the unit test.
Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is getting super close. I think the only place I'm still looking at is those two larger test comments:

  • Just looking for confirmation that you verified the constraint expression is what you expected, and it's testing the edge-aligned case specifically.
  • A couple of small tweaks to the write_bounds unit tests.

Thanks for all the other clean-up!

assert_array_equal(resulting_bounds_root_data,
expected_bounds_data)
# Check that bounds variable exists in the root group.
self.assertTrue(prefetch_dataset.variables[root_variable_name])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit, and you can ignore it, but it feels like maybe this assertion logically could go before you try and access the values in the bounds variables.

If the bounds variable doesn't exist, the code will likely throw some sort of exception on L430 (where you try and access the array values), which means you'll never actually run this assertion when it would fail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that logically. If the bounds variable doesn't exist, there's no point in doing the rest of the tests. Also, should this be:

# Check that bounds variable exists in the root group.
self.assertTrue(prefetch_dataset.variables[root_bounds_name])

with root_bounds_name instead of root_variable_name? Might've just caught my own mistake here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have the right nested_bounds_variable in the nested group test, so that must have been a copy/paste error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - I also hadn't spotted that, but I think you are right. Nice!

# (This is only the 'bounds' attribute, since other attributes
# are not required.)
root_bounds_varinfo_variable = varinfo_prefetch.get_variable(
root_variable_full_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer, I might be getting mixed up here:

# L423
 root_varinfo_variable = varinfo_prefetch.get_variable(
     root_variable_full_path)

and

# L439
root_bounds_varinfo_variable = varinfo_prefetch.get_variable(
    root_variable_full_path)

So aren't the next two assertions doing the same test?

Also root_bounds_varinfo_variable kind of implies that you are getting the bounds variable (e.g., /npolar_grid_lat_bnds) rather than the dimension variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variable names are... very difficult. I can maybe try to reorganize them to make them make at least a little bit more sense.

Yea, those are doing the same thing and I did mean to have a root_bounds_varinfo_variable. Wasn't this test for checking out the bounds variable? I must be turned around. Looking below, I guess I thought #3 was what this test was doing.

tests/unit/test_dimension_utilities.py Show resolved Hide resolved
tests/unit/test_dimension_utilities.py Outdated Show resolved Hide resolved
tests/test_adapter.py Show resolved Hide resolved
Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! I think you've addressed all my comments, (thanks for putting up with them all!)

@lyonthefrog lyonthefrog merged commit deb2796 into main Mar 14, 2024
3 checks passed
@lyonthefrog lyonthefrog deleted the DAS-1177 branch March 14, 2024 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants