Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep fill value for encoding #369

Merged
merged 9 commits into from
Jan 11, 2025
Merged

Conversation

abarciauskas-bgse
Copy link
Collaborator

While trying to create a virtual icechunk store for MUR SST, using the dmrpp reader, I noticed that a mean value coming back from the icechunk store was wrong. (To be clear this issue has nothing to do with icechunk, but the data have to be stored to zarr or icechunk). After some trial and error, I found that the issue was a missing fill value in the encoding of the data variables. This PR fixes the issue by replacing pop with get for the _FillValue in attrs, so they can be passed along in the encoding.

I verified this works by making the same code change in the dmrpp.py file VirtualiZarr dependency of my notebook (run on hub.openveda.cloud and using the quay.io/developmentseed/veda-optimized-data-delivery-image:latest image). The mean value returned from the icechunk.

  • Closes #xxxx n/a
  • Tests added
  • Tests passing
  • Full type hint coverage n/a
  • Changes are documented in docs/releases.rst
  • New functions/methods are listed in api.rst n/a
  • New functionality has documentation n/a

Please review @ayushnag

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right, thanks @abarciauskas-bgse

Copy link

codecov bot commented Jan 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.90%. Comparing base (bd010c4) to head (ea7e529).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #369   +/-   ##
=======================================
  Coverage   78.90%   78.90%           
=======================================
  Files          32       32           
  Lines        1906     1906           
=======================================
  Hits         1504     1504           
  Misses        402      402           
Files with missing lines Coverage Δ
virtualizarr/readers/dmrpp.py 91.93% <100.00%> (ø)

@ayushnag
Copy link
Contributor

ayushnag commented Jan 3, 2025

@abarciauskas-bgse Thanks for reporting the bug! I'm still not sure if _FillValue should be in attributes since even when virtualizarr is called on a netcdf dataset, the _FillValue is not in the attributes. I was trying to track down where it is removed and I think it happens in kerchunk here. I think perhaps the correct fix is to only remove _FillValue once it has been added to the variable encoding?

@abarciauskas-bgse
Copy link
Collaborator Author

Thanks for looking @ayushnag. Are we sure that _FillValue should be removed? I'm considering (a) I see _FillValue being listed as an attribute in NetCDF conventions and (b) @sharkinsspatial and @rabernat have run into some issues where _FillValue is not present and has also caused issues (see #343 and also the _FillValue).

@abarciauskas-bgse
Copy link
Collaborator Author

@ayushnag @TomNicholas I'm still not 100% sure I understand in what data structures for what backend data formats _FillValue should be in the attrs, as noted in #343 (comment), but am removing _FillValue from attrs based on conversation with @sharkinsspatial and wanting to merge this for now.

It appears there is an upstream failure that I will look into fixing.

@abarciauskas-bgse
Copy link
Collaborator Author

abarciauskas-bgse commented Jan 11, 2025

@mpiannucci @TomNicholas I had to pin icechunk==0.1.0a8 as opposed to icechunk>=0.1.0a8 as version 0.1.0a10 upgrades to zarr 3 which causes a lot of test failures. Similarly, I had to specify xarray>=2024.10.0,<2025.0.0 as xarray 2025.1.1 also had breaking changes. I am planning to work on upgrading to icechunk==0.1.0a10 (and thus the newly released zarr-python 3) so obviously these versions are short-lived 🤞🏽

@abarciauskas-bgse
Copy link
Collaborator Author

I'm going to merge as tests are passing and don't want this to get stale. Will work on the upgrades suggested https://github.com/zarr-developers/VirtualiZarr/pull/375/files

@abarciauskas-bgse abarciauskas-bgse merged commit 908ad94 into main Jan 11, 2025
13 checks passed
@TomNicholas TomNicholas deleted the ab/fix-fill-value-dmrpp branch January 13, 2025 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants