Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some fixes for OPeNDAP attributes #48

Merged
merged 18 commits into from
May 17, 2024

Conversation

noaaroland
Copy link
Contributor

It seems that double quotes in a String attribute value need to be escaped. The THREDDS Data Server escapes double quotes and the OPeNDAP client I tested (netCDF 4 C-library) fails when reading an attribute with a double quote.

Some old-school netCDF files have np.float32 as attribute values so these need to be accounted for when building the OPeNDAP object (other data types too probably).

abkfenris and others added 3 commits June 2, 2023 12:11
I've gotten testing setup so that we can spin up Xpublish in the background and test real clients against it.

Works on xpublish-community#18
@abkfenris
Copy link
Member

Thanks for the PR @noaaroland !

It would be great to add a test with example problematic attributes, though I think we would need to get #20 working so that we can serve a dataset to clients that would have issues.

@noaaroland
Copy link
Contributor Author

Where does this data set dap_client.get("/datasets/air/opendap.dds") come from? Is it part of the xpublish distribution?

I have small data files which illustrate the problem. How would I reference them from a test?

@abkfenris
Copy link
Member

I'm guessing you are looking at?

def test_dds_response(dap_client):
response = dap_client.get("/datasets/air/opendap.dds")

That's using the Py.test fixture system so that we can reuse shared code across multiple tests. Those fixtures are setup here:

@pytest.fixture(scope="session")
def dataset():
from xarray.tutorial import open_dataset
ds = open_dataset("air_temperature")
return ds
@pytest.fixture(scope="session")
def dap_xpublish(dataset):
rest = xpublish.Rest({"air": dataset}, plugins={"opendap": OpenDapPlugin()})
return rest
@pytest.fixture(scope="session")
def dap_client(dap_xpublish):
app = dap_xpublish.app
client = TestClient(app)
return client

We're setting up a mock server and client that we can make some requests of.

It doesn't actually expose a port, so we can't pass it to Xarray or NetCDF and try to see how different engines parse things. That's what I tried to do in #20

Copy link

codecov bot commented Dec 12, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.92%. Comparing base (edd1701) to head (aa8b1a1).
Report is 76 commits behind head on main.

Current head aa8b1a1 differs from pull request most recent head 2f9d5b7

Please upload reports for the commit 2f9d5b7 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #48      +/-   ##
==========================================
+ Coverage   92.04%   92.92%   +0.88%     
==========================================
  Files           3        3              
  Lines          88       99      +11     
==========================================
+ Hits           81       92      +11     
  Misses          7        7              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@noaaroland
Copy link
Contributor Author

I'm not sure what you're telling me regarding the getting #20 working or how I could help, but it occurs to me that the change I made relates to a data which which at a word in double quotes in a global attribute. See the word history in history_of_appended_files global attribute here: https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_CORECFS/Level1/1970-1974/B10K-K20_CORECFS_1970-1974_average_temp.nc.das

Perhaps one could test that case by adding such an attribute to the data set you're building. E.g.

@pytest.fixture(scope="session")
def dataset():
    from xarray.tutorial import open_dataset
    ds = open_dataset("air_temperature")
    ds.attrs['quattr'] = 'This attributes uses "quotes"'
    return ds

The other modifications I made were because attributes were falling through to the else which were not strings. Eventually I went testing for instanceof str directly, and falling through to the else as it was before. I feel like this might not be ideal in the end.

Let me know specifically how you'd like me to help out and I'll try to help improve the plugin.

Builds upon xpublish-community#48 by merging in the live server testing from xpublish-community#20 (with a few of the failing tests disabled from there that are in response to another issue).

The live server has an additional dataset that is specifically focused on testings if attributes with double quotes are correctly round tripped via the OpenDAP plugin.

Merge branch 'test-server' into quote-with-server
@abkfenris
Copy link
Member

Ok, maybe I didn't get my thoughts down well.

I just made a PR to your branch with most of the changes from #20 merged in. I went and disabled the changes from #20 that were causing failing tests (what we were trying to figure out in #18 ), but left the bits active that allow us to run a live server in our testing process.

That allows us to see how real clients interpret things like attributes (and may help with debugging your question here too xpublish-community/xpublish#246 ).

With the test server, I created a new dataset for the server to test how quotes get formatted in attributes, and a test that checks that.

Hopefully that makes a bit more sense.

If that sounds good, you can land noaaroland#1 onto your branch. Then I would suggest that you add datasets and tests for each one of the attribute format conversions to make sure they can be appropriately formatted by the server and loaded on the client.

tests/server.py Outdated Show resolved Hide resolved
@aufdenkampe
Copy link

aufdenkampe commented Apr 26, 2024

@abkfenris, it seems that the two issues just posted by @dblodgett-usgs might be fixed by this PR:

Note that the endpoint that @dblodgett-usgs was testing is our deployment of https://github.com/xpublish-experiments/Catalog-To-Xpublish developed by @xaviernogueira and that @sjordan29 will maintain.

How might we help move this PR forward?

@abkfenris
Copy link
Member

@abkfenris, it seems that the two issues just posted by @dblodgett-usgs might be fixed by this PR:

* [json view of dataset encoded as string? #57](https://github.com/xpublish-community/xpublish-opendap/issues/57)

* [Issue escaping string attributes with `"` characters? #58](https://github.com/xpublish-community/xpublish-opendap/issues/58)

Note that the endpoint that @dblodgett-usgs was testing is our deployment of https://github.com/xpublish-experiments/Catalog-To-Xpublish developed by @xaviernogueira and that @sjordan29 will maintain.

How might we help move this PR forward?

Whoops, I hadn't seen that there had been more progress here.

The first issue I believe is with a Catalog-to-Xpublish specific route, so I'm going to try to nudge that over there, but you're right in that the second one may be solved by this.

@noaaroland have you managed to deal with your attribute issues with this? Sorry I had missed that you had put further work into this.

@noaaroland
Copy link
Contributor Author

I could get the code to run without error based on the changes I made to the way attributes are treated, but it does still not return the correct OPeNDAP data structures for my use case. See: xpublish-community/xpublish#246

If you can whip this pull request into shape and use it please do, but I pretty much moved on when I didn't get any feedback.

@abkfenris
Copy link
Member

Sorry, I don't know quite enough about how Xarray decides if something should be an coordinate and/or dimension when reading OpenDAP, so I pinged some folks offline about xpublish-community/xpublish#246 I'll try to give folks a nudge again to take a look.

I can try to take care of the PR, but it may take me a few weeks as I'm about to head out on an expedition.

@noaaroland
Copy link
Contributor Author

Thank you. In my understanding of an old-school OPeNDAP server it's not really making an decisions about what is a coordinate, data variable or whatnot. The client figures out what's what by applying the convention by interpreting the attributes and looking at the structure (like is a variable the same name as the dimension). The server is just serving the attributes, structure and hunks of data as the client requests.

In this implementation since it's xarray on both ends, it gets confusing as to which piece is responsible for organizing the xarray object according to the conventions.

I hope we can get help sorting it out. Good luck with your expedition.

@abkfenris abkfenris merged commit 4b61311 into xpublish-community:main May 17, 2024
21 checks passed
@abkfenris
Copy link
Member

Thank you again @noaaroland for your work on this, and I'll try to get someone to take a look at the dimensions/coordinates issues you've had.

@sjordan29 sjordan29 mentioned this pull request May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants