Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset.services() method to list available services #500

Merged
merged 55 commits into from
Sep 16, 2024

Conversation

nikki-t
Copy link
Collaborator

@nikki-t nikki-t commented Mar 25, 2024

Github Issue: 447

Description

List available Harmony services for a collection. As a first step to facilitating the use of services in earthaccess, earthaccess was modified so that it can list the available services for a collection.

Overview of work done

A new services module was created that performs service queries. The results module was updated to include a services function which will use the services module to query a collection's services and return the provider-id and umm JSON for the service.

Sample services results:

{
    "provider-id": "POCLOUD",
    "umm": {
        "URL": {
            "Description": "the main access point",
            "URLValue": "https://opendap.earthdata.nasa.gov/"
        },
        "Type": "OPeNDAP",
        "ServiceKeywords": [
            {
                "ServiceCategory": "EARTH SCIENCE SERVICES",
                "ServiceTopic": "DATA MANAGEMENT/DATA HANDLING",
                "ServiceTerm": "DATA SUBSETTING/SUPERSETTING",
                "ServiceSpecificTerm": "SPATIAL SUBSETTING"
            },
            {
                "ServiceCategory": "EARTH SCIENCE SERVICES",
                "ServiceTopic": "DATA MANAGEMENT/DATA HANDLING",
                "ServiceTerm": "DATA SUBSETTING/SUPERSETTING",
                "ServiceSpecificTerm": "TEMPORAL SUBSETTING"
            },
            {
                "ServiceCategory": "EARTH SCIENCE SERVICES",
                "ServiceTopic": "DATA MANAGEMENT/DATA HANDLING",
                "ServiceTerm": "DATA SUBSETTING/SUPERSETTING",
                "ServiceSpecificTerm": "VARIABLE SUBSETTING"
            }
        ],
        "ServiceOrganizations": [
            {
                "Roles": [
                    "DEVELOPER"
                ],
                "ShortName": "UCAR/UNIDATA",
                "LongName": "Unidata, University Corporation for Atmospheric Research"
            }
        ],
        "OperationMetadata": [
            {
                "DistributedComputingPlatform": [
                    "WEBSERVICES"
                ]
            }
        ],
        "Description": "Earthdata OPEnDAP in the cloud",
        "Version": "9",
        "Name": "PO.DAAC Cloud OPeNDAP",
        "ServiceOptions": {
            "Subset": {
                "SpatialSubset": {
                    "BoundingBox": {
                        "AllowMultipleValues": false
                    }
                },
                "TemporalSubset": {
                    "AllowMultipleValues": false
                },
                "VariableSubset": {
                    "AllowMultipleValues": true
                }
            },
            "SupportedReformattings": [
                {
                    "SupportedInputFormat": "NETCDF-4",
                    "SupportedOutputFormats": [
                        "ASCII",
                        "CSV",
                        "NETCDF-3",
                        "NETCDF-4"
                    ]
                },
                {
                    "SupportedInputFormat": "HDF5",
                    "SupportedOutputFormats": [
                        "ASCII",
                        "CSV",
                        "NETCDF-3",
                        "NETCDF-4"
                    ]
                }
            ]
        },
        "MetadataSpecification": {
            "URL": "https://cdn.earthdata.nasa.gov/umm/service/v1.5.2",
            "Name": "UMM-S",
            "Version": "1.5.2"
        },
        "LongName": "PO.DAAC OPeNDADP In the Cloud"
    }
}

Overview of verification done

Tested new service functionality on four collections:

  • HLSS30
  • MUR-JPL-L4-GLOB-v4.1
  • VIIRS_NPP-JPL-L2P-v2016.2
  • ATL22
  • TOMSN7SO2

Overview of integration done

Created a new unit test to test DataService get. Unit test coverage:

---------- coverage: platform linux, python 3.11.6-final-0 -----------
Name                                    Stmts   Miss  Cover   Missing
---------------------------------------------------------------------
earthaccess/__init__.py                    31      3    90%   77-81
earthaccess/api.py                        107     77    28%   26-28, 65-78, 117, 124, 143-158, 182-193, 211-213, 234-239, 248-252, 261-265, 284-285, 307-308, 328-336, 345-346, 350-355
earthaccess/auth.py                       207     80    61%   17-18, 49-63, 95-96, 122-154, 188-206, 211-215, 242, 251-252, 260-270, 316-317, 346-358, 362-372, 384
earthaccess/daac.py                        20      6    70%   136, 140-147
earthaccess/formatters.py                  17     10    41%   11, 18, 22-59
earthaccess/kerchunk.py                    32     26    19%   13-22, 32-58
earthaccess/results.py                    152     81    47%   28-31, 34-47, 88-97, 107-109, 117, 124-130, 137-139, 146-148, 155-158, 165-166, 174-176, 184-200, 212-220, 223, 258-261, 268-276, 283-284, 287-290, 306-317, 321-329, 355, 379-380
earthaccess/search.py                     305    139    54%   47, 63-72, 88-89, 99-100, 114, 129-133, 146-150, 165-180, 184-186, 194-195, 203-204, 217, 235-236, 244, 276-312, 374, 392-396, 421, 441-442, 450, 458-464, 474-475, 489-498, 511-516, 525-526, 534-535, 543-544, 552-553, 564-565, 573-574, 582, 588, 628, 634-638, 641, 650, 654, 660-662, 665, 678-679, 721-722, 731-732, 741-742, 772-773, 782-783, 795-803
earthaccess/services.py                    34      6    82%   31, 51, 57-58, 61, 66
earthaccess/store.py                      289    190    34%   27-28, 31, 34, 42, 50-55, 62-75, 82-86, 109-114, 118-121, 124-125, 128-131, 134-137, 148, 156-159, 183-190, 193-194, 214, 224, 239-242, 292, 310-312, 331, 340-385, 394-440, 467-478, 506, 516-536, 546-582, 595-618, 634-649, 656-665
earthaccess/utils/_validation.py            5      3    40%   5-7
earthaccess/widgets.py                      0      0   100%
tests/unit/__init__.py                      0      0   100%
tests/unit/conftest.py                      0      0   100%
tests/unit/test_auth.py                    60      2    97%   133-134
tests/unit/test_collection_queries.py      30      0   100%
tests/unit/test_formatters.py               0      0   100%
tests/unit/test_granule_queries.py         22      0   100%
tests/unit/test_results.py                 22      0   100%
tests/unit/test_store.py                   58      0   100%
---------------------------------------------------------------------
TOTAL                                    1391    623    55%

========================================================================================================= 26 passed, 2 warnings in 1.75s =========================================================================================================
+ bash ./scripts/lint.sh
+ mypy earthaccess --disallow-untyped-defs
Success: no issues found in 12 source files
+ ruff check .

Create a new integration test that tests the services functionality from the service query to the parsing of query results. Integration test coverage:

---------- coverage: platform linux, python 3.11.6-final-0 -----------
Name                                        Stmts   Miss  Cover   Missing
-------------------------------------------------------------------------
earthaccess/__init__.py                        31      4    87%   68-72, 75
earthaccess/api.py                            107     33    69%   77, 124, 144-152, 190-192, 213, 237-238, 248-252, 261-265, 284-285, 331-334, 336, 345-346
earthaccess/auth.py                           207     59    71%   17-18, 122-154, 188-206, 211-212, 242, 251-252, 260-261, 266, 269, 346-358, 362-372, 384
earthaccess/daac.py                            20      2    90%   136, 147
earthaccess/formatters.py                      17     10    41%   11, 18, 22-59
earthaccess/kerchunk.py                        32     26    19%   13-22, 32-58
earthaccess/results.py                        152     58    62%   28-31, 34-47, 88-97, 107-109, 124-130, 137-139, 146-148, 155-158, 165-166, 174-176, 193, 223, 258-261, 268-276, 283-284, 287-290, 316-317, 321-329, 355, 379-380
earthaccess/search.py                         305     81    73%   69-70, 88-89, 114, 129-133, 146-150, 172, 184-186, 194-195, 203-204, 217, 235-236, 284, 290-294, 297, 304, 392-396, 421, 441-442, 459, 474-475, 490, 511-516, 525-526, 543-544, 552-553, 574, 582, 628, 634-638, 641, 650, 662, 678-679, 721-722, 731-732, 741-742, 772-773, 782-783, 795-803
earthaccess/services.py                        34      6    82%   31, 51, 57-58, 61, 66
earthaccess/store.py                          289     89    69%   34, 42, 62-75, 109-114, 118-121, 124-125, 128-131, 159, 183-190, 193-194, 214, 224, 241-242, 312, 331, 345, 373-374, 383-385, 394-440, 468-470, 478, 506, 523-536, 596, 612-617, 635, 637, 660-661
earthaccess/utils/_validation.py                5      0   100%
earthaccess/widgets.py                          0      0   100%
tests/integration/conftest.py                   9      5    44%   8-12
tests/integration/test_api.py                  51      0   100%
tests/integration/test_auth.py                 82     15    82%   56, 64-65, 87-95, 106, 117-118
tests/integration/test_cloud_download.py       80     13    84%   88-89, 130-133, 141-142, 147-156, 169
tests/integration/test_cloud_open.py           82      5    94%   101, 134-135, 159, 169
tests/integration/test_kerchunk.py             41     33    20%   11-87
tests/integration/test_onprem_download.py      82      5    94%   94, 127-128, 159, 161
tests/integration/test_onprem_open.py          75      4    95%   93, 126-127, 151
tests/integration/test_services.py             24      0   100%
-------------------------------------------------------------------------
TOTAL                                        1725    448    74%

================================================================================================================= short test summary info ==================================================================================================================
FAILED tests/integration/test_api.py::test_download[True-0] - ValueError: earthaccess can't yet guess the provider for cloud collections, we need to use one from earthaccess.list_cloud_providers()
FAILED tests/integration/test_api.py::test_download[True-selection1] - ValueError: earthaccess can't yet guess the provider for cloud collections, we need to use one from earthaccess.list_cloud_providers()
FAILED tests/integration/test_auth.py::test_auth_can_read_from_netrc_file - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_populates_attrs - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_can_fetch_s3_credentials - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location0] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location1] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location0] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location1] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac0] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac1] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac2] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac3] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac4] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_multi_file_granule - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_cloud_open.py::test_multi_file_granule - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_onprem_download.py::test_earthaccess_can_download_onprem_collection_granules[daac0] - AssertionError: False is not true
FAILED tests/integration/test_onprem_download.py::test_earthaccess_can_download_onprem_collection_granules[daac1] - AssertionError: False is not true
============================================================================================= 18 failed, 52 passed, 1 skipped, 3 warnings in 174.40s (0:02:54) =============================================================================================

PR checklist:

  • Linted & Formatted
  • Updated unit & integration tests
  • Updated changelog & readme
  • Updated documentation

Pending:

  • Pass currently failing integration tests.

📚 Documentation preview 📚: https://earthaccess--500.org.readthedocs.build/en/500/

@mfisher87 mfisher87 changed the title Feature/issue 447 Add dataset.services() method to list available Harmony services Mar 25, 2024
@nikki-t
Copy link
Collaborator Author

nikki-t commented Mar 25, 2024

It seems like the existing integration tests are failing due to credentials which I had set up as environment variables and in a .netrc file when running them. What is the best way to handle credentials in these tests? Also happy to move this to a draft if we want to work out those details first.

@mfisher87
Copy link
Collaborator

Hope you don't mind the rename, was finding the old name difficult to remember in my notifications :)

@mfisher87
Copy link
Collaborator

mfisher87 commented Mar 25, 2024

What is the best way to handle credentials in these tests?

@betolink @andypbarrett I think this may need documenting! As a developer, how do I run integration tests on my laptop?

@nikki-t
Copy link
Collaborator Author

nikki-t commented Apr 26, 2024

@mfisher87 - I think that I was able to fix the integration test and also modified the services unit and integration tests to use VCR. All unit tests are passing.

I ran the integration tests locally and here is the summary:

==================== short test summary info =================================

FAILED tests/integration/test_auth.py::test_auth_can_read_from_netrc_file - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_populates_attrs - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_can_fetch_s3_credentials - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location0] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location1] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location0] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location1] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

====== 7 failed, 71 passed, 1 skipped in 496.39s (0:08:16) ==========================================

I wonder if these tests need to be run in the same AWS region as the data?

Other than that I think this PR can be merged but let me know how that typically works for earthaccess development.

@mfisher87
Copy link
Collaborator

Will you be at hack day this coming week? I may not have time to review this before then 😬

@nikki-t
Copy link
Collaborator Author

nikki-t commented Apr 28, 2024

@mfisher87 - Unfortunately I am on travel this coming week so I can't make it but I do plan to attend the next earthaccess hack day in a few weeks. We can discuss then if it's easier! 😄

@mfisher87
Copy link
Collaborator

Sounds like a good plan :) Safe and fun travels!

@nikki-t
Copy link
Collaborator Author

nikki-t commented Sep 10, 2024

@chuckwondo @betolink @mfisher87 - Any thoughts on the failing Integration Tests and the Python3.9 Unit Test? I implemented the fix recommended by @chuckwondo to decode the compressed VCR response and updated my fork with changes from main.

It seems like the integration tests are failing with an Earthdata login? Maybe this is an issue introduced by merging main with my fork - How does the workflow get Earthdata credentials?

INFO     earthaccess.auth:auth.py:266 Authentication with Earthdata Login failed with:
{"error":"invalid_header","error_description":"Invalid header"}
...
   File "/home/runner/work/earthaccess/earthaccess/tests/integration/test_cloud_download.py", line 67, in <module>
    assertions.assertTrue(auth.authenticated)

And the Unit Test may have to do with the version of Python and the requests library trying to deal with the decompression.

 FAILED tests/unit/test_services.py::TestServices::test_service_results - requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))
FAILED tests/unit/test_services.py::TestServices::test_services - requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))

How long do plan to support Python3.9? I am wondering if it doesn't play nicely with the VCR decode_compressed_response parameter. I can also look into removing VCR usage and just call CMR (although the service records do change from time to time so it's nice to freeze the response).

I don't have a ton of time at the moment to continue to troubleshoot but hope to get back into the swing of things in the next month or so.

@chuckwondo
Copy link
Collaborator

chuckwondo commented Sep 11, 2024

You can ignore these integration test failures, which we've been seeing for other PRs as well:

=========================== short test summary info ============================
ERROR tests/integration/test_cloud_download.py - AssertionError: False is not true
ERROR tests/integration/test_cloud_open.py - AssertionError: False is not true
ERROR tests/integration/test_onprem_download.py - AssertionError: False is not true
ERROR tests/integration/test_onprem_open.py - AssertionError: False is not true
!!!!!!!!!!!!!!!!!!! Interrupted: 4 errors during collection !!!!!!!!!!!!!!!!!!!!
========================= 1 skipped, 4 errors in 2.27s =========================

@mfisher87 and I have been trying to determine why this happens, but for these particular integration test failures, we have seen that once PRs with such failures (and only such failures) are merged, they pass on the merged PRs.

Regarding the unit test failurs for Python 3.9, I'll do some digging. It appears that we may need to do some Python (or urllib3/requests lib) version check to determine whether or not the vcrpy hint I provided should be applied, since the unit tests now fail only for Python 3.9.

I'll follow up once I discover something.

@mfisher87
Copy link
Collaborator

We thought at first that setting the needed secrets on the fork would enable the tests to pass, but nah. I'd like one really clear and definitive resource on how to make testing with forks more accessible. I'd love to contribute something like that to opensource.guide but at this point we have like 4 or 5 weird mysteries or user experience pain points to solve for 😆

@chuckwondo
Copy link
Collaborator

@nikki-t, I managed to get all the unit tests across all python versions to pass locally. It took a bit of wrangling to subdue the issue with vcrpy and gzip compression, but I used your branch as the basis for sorting things out. However, it might be a bit cumbersome for us to have you apply my changes to your branch via comments on your PR, so let me construct a patch for you to apply to your branch. If you don't want to deal with applying a patch, I could open a new PR.

@mfisher87
Copy link
Collaborator

mfisher87 commented Sep 13, 2024

@chuckwondo since Nikki has set this setting, what do you think about just pushing to this PR? Once you add the source repository as a remote, GH will let you push to this branch.

image

@nikki-t
Copy link
Collaborator Author

nikki-t commented Sep 13, 2024

Thanks @chuckwondo!! Let me know if you need me to apply the patch or if you are able to push to the PR like @mfisher87 mentions.

@chuckwondo
Copy link
Collaborator

@nikki-t and @mfisher87, this should do the trick. The failing int. tests are the usual suspects that should pass once merged.

chuckwondo
chuckwondo previously approved these changes Sep 13, 2024
@chuckwondo chuckwondo changed the title Add dataset.services() method to list available Harmony services Add dataset.services() method to list available services Sep 14, 2024
@chuckwondo
Copy link
Collaborator

@mfisher87, @nikki-t, do either of you want to take another look at this now that I've tweaked it to get all the unit tests to pass (and believe the int. tests will also pass once merged)?

@nikki-t
Copy link
Collaborator Author

nikki-t commented Sep 16, 2024

@chuckwondo - This looks good to me. I like the top-level function to search for services.

@chuckwondo
Copy link
Collaborator

@mfisher87, last chance for you to take another look before I merge

@mfisher87
Copy link
Collaborator

Hey Chuck, thanks for reminding me! I am OK with merging based on what I've seen already! But won't have opportunity to do a real "Review" until hack time tomorrow 😬

@chuckwondo
Copy link
Collaborator

Hey Chuck, thanks for reminding me! I am OK with merging based on what I've seen already! But won't have opportunity to do a real "Review" until hack time tomorrow 😬

No worries. Since this has been long-running, and we've all gone over it, I'm going to merge it so we can then focus on more recent PRs.

@chuckwondo chuckwondo merged commit 699cc4e into nsidc:main Sep 16, 2024
6 of 11 checks passed
@mfisher87
Copy link
Collaborator

Wonderful, thank you so much for helping keeping the PRs flowing, Chuck! And of course, Nikki, wonderful work! 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hackathon An issue we'd like to visit during a hackathon event
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants