Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAS-2232 - Functionality added to support SMAP L3 products #15

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

sudha-murthy
Copy link
Collaborator

@sudha-murthy sudha-murthy commented Sep 10, 2024

Description

SMAP L3 does not have 1D dimension scales and grid mapping variable which is needed by HOSS to do spatial subsetting.
Methods added to override the missing grid mapping with overrides in the hoss_config.json and supporting methods.
Methods also added to use the 2D coordinate datasets to generate 1D dimension scales that could be used for to calculate
the index ranges to provide the spatial subsetted outputs

Jira Issue ID

DAS-2132

Local Test Steps

  • Spatial subsetting can be tested without mask fill. (till DAS-2215 is complete)
  • Run Harmony in the box with HOSS.
  • Comment out of mask_fill in services.yaml file
  • Run a spatial subset request locally and ensure you get the right subsetted output
  • e.g.
  1. http://localhost:3000/C1268452378-EEDTEST/ogc-api-coverages/1.0.0/collections/parameter_vars/coverage/rangeset?forceAsync=true&granuleId=G1268452388-EEDTEST&subset=lat(70%3A80)&subset=lon(-160%3A-150)&variable=Soil_Moisture_Retrieval_Data_AM%2Fstatic_water_body_fraction&skipPreview=true
  2. http://localhost:3000/C1268452378-EEDTEST/ogc-api-coverages/1.0.0/collections/parameter_vars/coverage/rangeset?forceAsync=true&granuleId=G1268452388-EEDTEST&subset=lat(54%3A72)&subset=lon(2%3A42)&format=application%2Fx-netcdf4&variable=Soil_Moisture_Retrieval_Data_AM%2Falbedo%2CSoil_Moisture_Retrieval_Data_AM%2Fsurface_flag&skipPreview=true
  • 3D variables do not work - pending on DAS-2238
  • New unit tests have not been added. The current unit tests are passing
  • DAS-2236 written to handle fill values in the corners.
  • Jupyter test notebooks exist for SMAP L3 and need to be updated

PR Acceptance Checklist

  • Jira ticket acceptance criteria met.
  • [X ] CHANGELOG.md updated to include high level summary of PR changes.
  • docker/service_version.txt updated if publishing a release.
  • Tests added/updated and passing.
  • Documentation updated (if needed).

hoss/spatial.py Outdated Show resolved Hide resolved
@@ -55,6 +59,25 @@ def is_index_subset(message: Message) -> bool:
)


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to make it a more configurable way to override dimensions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be addressed as part of DAS-2241(var info) and DAS-2238 (HOSS)

@sudha-murthy sudha-murthy marked this pull request as draft September 12, 2024 17:40
@sudha-murthy sudha-murthy marked this pull request as ready for review September 13, 2024 05:09
"false_easting": 0.0,
"false_northing": 0.0
}
],
Copy link
Contributor

@autydp autydp Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not hard over on this, but I had thought the grid-mapping variables would be handled slightly differently, and I guess I didn't catch this in the VarInfo review. I thought the applicability reference would be to a "phantom" variable reference (name fully enumerated, path might be pattern match or fully enumerated), with that attribute declarations as above (not as a "hard coded" Grid_Mapping_Data array). This declaration might have to exist (be declared) following rules that establish a reference to this variable, or the order might not matter. In any case, the phantom variable would have to exist as an attribute reference value from one or more science variables. VarInfo would be able to create attributes on this phantom variable, and the application (HOSS) could ask for the attributes for this variable, even if the variable itself did not exist in VarInfo's variable list. This might actually be the VarInfo 3.0 changes, so moot at this point, but I would want to check on that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be done as part of DAS-2256. updated the ticket with the above information

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@autydp - this diff is indicating the removal of Grid_Mapping_Data (as it is red). What you are describing is indeed what the code should be doing, and I think what @sudha-murthy has done.

Even prior to earthdata-varinfo==3.0.0 (which cuts the Grid_Mapping_Data section from the configuration file schema), it is possible to express the grid mapping as attributes, and that's what @sudha-murthy has done lower in the configuration file. (See the big green block starting around L220)

@@ -73,38 +113,133 @@ def prefetch_dimension_variables(
"""
required_dimensions = varinfo.get_required_dimensions(required_variables)
if len(required_dimensions) == 0:
required_dimensions = get_override_dimensions(varinfo, required_variables)
logger.info('coordinates: ' f'{required_dimensions}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might change, and the get_override_dimensions method, assuming for the "3D" fixes we introduce the "Dimensions" attribute as a special case CF override. Then required_dimensions might not be zero, but we still have to recognize the special case of "dimension" without a dimension variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be addressed in DAS-2238. Added the above comments to the ticket

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sudha, There's a lot here to review and unfortunately I haven't worked with HOSS directly before. I tried to look through what I could and make decent suggestions.

But for sure, you need to test your changes. When you run the tests a coverage report is generated. Before your changes the results were:

Test Coverage Estimates
Name                           Stmts   Miss  Cover
--------------------------------------------------
Hoss/dimension_utilities.py      156      2    99%
hoss/spatial.py                   61      0   100%
--------------------------------------------------
TOTAL                            668      8    99%

And after they dropped considerably:


Test Coverage Estimates
Name                           Stmts   Miss  Cover
--------------------------------------------------
hoss/dimension_utilities.py      242     65    73%
hoss/spatial.py                   70      8    89%
--------------------------------------------------
TOTAL                            771     81    89%

It is very difficult to be able to understand the changes when I can't look at test and see what a function was supposed to do. Likewise, the function comments were not updated to describe the new functionality. Hopefully I'm not confusing anything with my lack of familiarity. I will defer to Owen if there's differences of opinion on what should be done.

Final test instructions assume a strong understanding of the problem you were solving, but I was able to eventually run the two test URLs and get output. Should the output files be geolocated somehow? Because they open in panoply, but aren't geo-2d, just 2d.

Comment on lines +4 to +6
This version of HOSS provides support for products without CF compliance like SMAP L3
Methods added to get dimension scales from coordinate attributes and grid mapping with overrides
in the json file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we keep track of all the changes to the repository and should be worded carefully with full sentences and punctuation.

Specifically, I would change "without CF compliance" -> "that do not comply with CF standards"

Include the methods that are added by name.

I don't fully understand this "dimension scales from coordinate attributes and grid mapping with overrides in the json file" There's something wrong with the grammar that is confusing me. Ok. I think you saying "Methods are added to retrieve dimension scales from coordinate attributes and grid mappings, using overrides specified in the earthdata-varinfo configuration file."

Instead of just saying json file, say earthdata-varinfo config file, hoss_config.json

Copy link
Member

@flamingbear flamingbear Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a rule usually you put the 👍 emoji after you've made the change as a alert to me that you saw my comment, agreed with it and changed it. When I see the emoji I expect to be able to look at the change and then resolve my comment. It also lets you keep track of which comments you have actually taken care of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since it looks like you've thumbed up a bunch of things, you can add a comment with the githash as a way to alert the commenter that you have addressed the issue, when you get there.

projection_variable_name = 'projected_y'
elif override_variable.is_longitude():
projection_variable_name = 'projected_x'
return projection_variable_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this ever return None? if so you need to update the mypy return type.

required_variables, 'coordinates'
)
return override_dimensions
except AttributeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does get_references_for_attribute raise this exception? I don't see how that happens, but I could be wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does raise this exception if it does not find it. I added that when I was testing. Will check when it happens and document

@@ -73,38 +113,133 @@ def prefetch_dimension_variables(
"""
required_dimensions = varinfo.get_required_dimensions(required_variables)
if len(required_dimensions) == 0:
required_dimensions = get_override_dimensions(varinfo, required_variables)
logger.info('coordinates: ' f'{required_dimensions}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this 'coordinates:'? should it be showing the required_dimensions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the required_dimensions are coordinate variables. The get_override_dimensions basically returns coordinate variables. Changing that method name to be get_coordinate_variables.
It just shows in the logger that we did not get named dimensions - but got the coordinates

Comment on lines +140 to +141
dimensions_array = prefetch_dataset[dimension_variable.full_name_path][:]
return dimensions_array.ndim == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have to get the data from Dataset here? I think you can access the ndim without evaluating the array with just prefetch_dataset[dimension_variable.full_name_path].ndim

if grid_mapping_variable is None:
# check for any overrides
cf_attributes = varinfo.get_missing_variable_attributes(grid_mapping)
if len(cf_attributes) != 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(cf_attributes) != 0:
if cf_attributes:

again more pythonic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole function seems weirdly structured.

Why are you getting the grid mapping for the override case after you've already found a valid grid_mapping?

Wouldn't it make sense to take a grid_mapping and make crs like it always did, and then if it couldn't do it, go looking for a different override?

For this piece of code:

            grid_mapping_variable = varinfo.get_variable(grid_mapping)
            if grid_mapping_variable is None:
                # check for any overrides
                cf_attributes = varinfo.get_missing_variable_attributes(grid_mapping)
                if len(cf_attributes) != 0:
                    crs = CRS.from_cf(cf_attributes)
                    return crs
            crs = CRS.from_cf(varinfo.get_variable(grid_mapping).attributes)

It seems like in the previous version of this function, if there was a grid_mappingwe tried to
make a CRS from the attributes, if we had no attributes we raised and error.

Now, if you have a grid_mapping you first check to see if it's a real
variable and if not, you check for totally fake override and make a crs from that.

I feel like the right thing is to. try to make a CRS from the grid mapping you
already have. And when that fails. you should go down the path to find out if
it was because the grid_mapping was fake and you need to grabe it with get_missing_variable_attributes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to see a test where this new path is followed. there is already a set of tests for this function and it would fit right in https://github.com/nasa/harmony-opendap-subsetter/blob/DAS-2232/tests/unit/test_projection_utilities.py#L195

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will redo that method..Not sure why I made it complicated. and add a test as well.

@@ -102,8 +104,7 @@ def get_spatial_index_ranges(
index_ranges[dimension] = get_geographic_index_range(
dimension, varinfo, dimensions_file, bounding_box
)

if len(projected_dimensions) > 0:
elif len(projected_dimensions) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif len(projected_dimensions) > 0:
elif projected_dimensions:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has changed the definition should change too.

return index_ranges
else:
override_dimensions = get_override_dimensions(varinfo, required_variables)
if len(override_dimensions) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(override_dimensions) > 0:
if override_dimensions:

varinfo, non_spatial_variable
)

if len(override_dimensions) == 0:
Copy link
Member

@flamingbear flamingbear Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(override_dimensions) == 0:
if not override_dimensions:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimming through - isn't this the opposite? Wouldn't it be if not overide_dimensions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch.

Comment on lines +122 to +132
for non_spatial_variable in non_spatial_variables:
index_ranges.update(
get_projected_x_y_index_ranges(
non_spatial_variable,
varinfo,
dimensions_file,
index_ranges,
bounding_box=bounding_box,
shape_file_path=shape_file_path,
override_dimensions=override_dimensions,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I'm a bit stumped. Couldn't you just simplify it to this? Tests would be great because then I could have just tried it.

def get_spatial_index_ranges(
    required_variables: Set[str],
    varinfo: VarInfoFromDmr,
    dimensions_path: str,
    harmony_message: Message,
    shape_file_path: str = None,
) -> IndexRanges:
    """Return a dictionary containing indices that correspond to the minimum
    and maximum extents for all horizontal spatial coordinate variables
    that support all end-user requested variables. This includes both
    geographic and projected horizontal coordinates:

    index_ranges = {'/latitude': (12, 34), '/longitude': (56, 78),
                    '/x': (20, 42), '/y': (31, 53)}

    If geographic dimensions are present and only a shape file has been
    specified, a minimally encompassing bounding box will be found in order
    to determine the longitude and latitude extents.

    For projected grids, coordinate dimensions must be considered in x, y
    pairs. The minimum and/or maximum values of geographically defined
    shapes in the target projected grid may be midway along an exterior
    edge of the shape, rather than a known coordinate vertex. For this
    reason, a minimum grid resolution in geographic coordinates will be
    determined for each projected coordinate variable pairs. The input
    bounding box or shape file will be populated with additional points
    around the exterior of the user-defined GeoJSON shape, to ensure the
    correct extents are derived.

    """
    bounding_box = get_harmony_message_bbox(harmony_message)
    index_ranges = {}

    geographic_dimensions = varinfo.get_geographic_spatial_dimensions(
        required_variables
    )
    projected_dimensions = varinfo.get_projected_spatial_dimensions(required_variables)
    non_spatial_variables = required_variables.difference(
        varinfo.get_spatial_dimensions(required_variables)
    )
    override_dimensions = get_override_dimensions(varinfo, required_variables)
    

    with Dataset(dimensions_path, 'r') as dimensions_file:
        if len(geographic_dimensions) > 0:
            # If there is no bounding box, but there is a shape file, calculate
            # a bounding box to encapsulate the GeoJSON shape:
            if bounding_box is None and shape_file_path is not None:
                geojson_content = get_shape_file_geojson(shape_file_path)
                bounding_box = get_geographic_bbox(geojson_content)

            for dimension in geographic_dimensions:
                index_ranges[dimension] = get_geographic_index_range(
                    dimension, varinfo, dimensions_file, bounding_box
                )
        elif projected_dimensions or override_dimensions:
            for non_spatial_variable in non_spatial_variables:
                index_ranges.update(
                    get_projected_x_y_index_ranges(
                        non_spatial_variable,
                        varinfo,
                        dimensions_file,
                        index_ranges,
                        bounding_box=bounding_box,
                        shape_file_path=shape_file_path,
                        override_dimensions=override_dimensions
                    )
                )
        return index_ranges

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I could. Owen is suggesting to write a separate method and not re-use get_projected_x_y_index_ranges for the override case.. will check more into this

Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sudha-murthy for putting a lot of effort into this PR!

This review is not 100% thorough. I think there are a few bigger things here that need to be addressed, and that will make it easier to digest in full:

  • I think the new functionality to deal with overriding dimensions should not just go in the get_projected_x_y_ranges. By doing so, it's adding complexity to that function, which makes things harder to understand overall. I also think it is forcing you to do some things that would otherwise be unnecessary - for example, I don't think you need to write the variables to the NetCDF-4 file, you just need to ask a function for 1-D dimension variables and spit out 1-D numpy arrays.
  • I'm really worried about the use of set objects to contain overriding dimensions. The ordering of dimensions is important, and must match the underlying array axes. I suspect this is the reason you are having issues with 3-D variables (it's certainly part of the issue).
  • A lot of the additional code is written in ways that adds high levels of cyclomatic complexity. I've suggested a few ways to cut that down, but it's worth taking a step back and working out what is common to code branches and what is truly different. Try to minimise what has to go in an if/elif/else, and then make sure each branch gives you the same output. The add_index_ranges function is perhaps the area of most concern.
  • There are a few places where the code branches and only assigns variables in one branch which are then used after the branching is finished. This will cause errors when something skips that block and later tries to access a non-existent variable.
  • This PR has no new or updated unit tests, either for new functions of new branches of code. I know I'm religious about these things, but they are absolutely needed. Particularly on a PR of this scope. There are a bunch of changes, and they are complicated and hard to understand. Not only do tests make sure things work, reading them makes it easier to understand how the code flows.
  • There are a lot of places where code has changed significantly, but the documentation strings describing the functions have not. They should definitely be updated to keep in sync with what's going on.

Sorry for all the comments, but hopefully they are helpful!


This version of HOSS provides support for products without CF compliance like SMAP L3
Methods added to get dimension scales from coordinate attributes and grid mapping with overrides
in the json file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the extent of the changes in this PR, I think this description needs a lot more detail. It should include what specifically has been added (e.g., function names). This also doesn't really describe the meat of the new method for deriving things (e.g., what sort of CF-Convention compliance is missing and that we are now using 2-D coordinate information in place of missing 1-D grid dimension variables).

@@ -1 +1 @@
1.0.5
1.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a minor version change feels like the right call here. It's not an API change, but we are adding new functionality in a backwards compatible way.

@@ -55,6 +59,42 @@ def is_index_subset(message: Message) -> bool:
)


def get_override_projected_dimensions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not include any changes to unit tests. Each new function you are adding (or new branch of code within an existing conditional block) needs unit testing that hits every branch of the code within a function. I've not reviewed any of the code changes in detail yet, but this lack of tests is an absolute blocker for me to approve the PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Owen. Will add the unit tests.

f'{dimensions_path}'
', shapefilepath:'
f'{shape_file_path}'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a redundant logging statement (see: L82 of this file on the main branch). The names of the files that are downloaded are also logged as part of the harmony-py call to download a file (including the start time and the end time of the download).

@@ -131,6 +145,9 @@ def subset_granule(
shape_file_path,
)
)
logger.info(
'subset_granule - index_ranges:' f'{format_dictionary_string(index_ranges)}'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like another redundant logging statement. We already have statements that log:

  • The requested variables.
  • The variables retrieved from OPeNDAP in the prefetch.
  • The required variables.
  • The final set of variables with the index ranges.

This statement doesn't add any new information.


if dimension_range is not None and dimension_range[0] <= dimension_range[1]:
range_strings.append(f'[{dimension_range[0]}:{dimension_range[1]}]')
if variable.dimensions == []:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thought: This behaviour (having no dimensions) is the special case that happens rarely. It makes more logical sense to have that come second. When re-reading the code, the flow will make more sense if the most common occurring thing is read first.

range_strings.append(f'[{dimension_range[0]}:{dimension_range[1]}]')
if variable.dimensions == []:
override_dimensions = get_override_dimensions(varinfo, [variable_name])
if len(override_dimensions) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to check for the length being greater than zero, if you are iterating through the list immediately after. If the length is zero, the for-in loop will just become a no-op.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is to check the else case where the variable is the same as override or the requested variable is the coordinate variable itself

if variable.dimensions == []:
override_dimensions = get_override_dimensions(varinfo, [variable_name])
if len(override_dimensions) > 0:
for override in reversed(list(override_dimensions)):
Copy link
Member

@owenlittlejohns owenlittlejohns Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you reversing things here? The ordering should be preserved. If the order of override dimensions does not reflect the ordering of array axes, then that needs to be fixed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will revisit that. I had a problem with fixing the override dimensions list.

@@ -422,22 +559,48 @@ def add_index_range(
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this function: READ THIS COMMENT FIRST:

I think this is really overcomplicated. Fundamentally, all this function is doing is getting a list of dimensions for a variable. Then it is looking in a cache to see if there is an index range for any of the dimension names, before using that cache value to tack something on the end of a string.

Maybe I'm being naive, but it feels like really all you need is something to determine the correct list of variable dimensions, then all the rest of the logic (looking in the cache and appending strings to the end of the variable name) is the same. That latter stuff 100% does not need to be in the duplicated in multiple condition branches. It's making this function unnecessarily hard to read.

The other, really important comment: I am super suspicious of the bit where you are needing to reverse the order of the dimensions list. However that is derived, it should be reliably in the ordering of the variable axes. I'm wary that what this means is that you have found that for your particular use-case the "random" ordering in a set is predictable for the pseudo-dimensions you have for SMAP L3, and you can coincidentally impose the order you need coincidentally by reversing the set. I really think dimensions should not be passed around in sets, because you need that ordering. I strongly suspect this is the root cause of your issues with 3-D variables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering and the shape are things I need to get from varinfo. I dont have that information. Updated DAS-2241 for this.

@@ -549,7 +712,10 @@ def get_dimension_bounds(
be returned.
"""
bounds = varinfo.get_variable(dimension_name).references.get('bounds')
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to add a comment on the line after the try:

# For pseudo-variables, `varinfo.get_variable` returns `None` and therefore has no `references` attribute.

(My first thought looking at this line was that the ...get('bounds') is a dict.get and so you don't need to guard against an AttributeError, but the problem is earlier in the line of code.)

def get_override_projected_dimensions(
varinfo: VarInfoFromDmr,
override_variable_name: str,
) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

) -> str | None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The review is a bit disjointed now with all the comments. Here is my suggestion:

  • Rename prefetch_dimension_variables to prefetch_variables
  • Expand the description to describe how prefetch_variables will prefetch dimension-variables when these exist, and, as an alternative, will prefetch coordinate variables, later used to calculate dimension-scale values (pseudo-dimension-variables).
  • I would put prefetch_variables before the above functions, after is_index_subset, just to provide some context for these following functions.
  • Rename get_override_dimensions to get_coordinate_variables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants