-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FixedScaleOffset for handling NaN inputs #511
Comments
I can certainly imagine an offset for integers, but a scale less so. Of course, you also get to specify the bitsize of each, and it can be important whether you save as uint8 and load into float32 or something bigger. |
This commit addresses issue zarr-developers#511 by adding support for handling NaN inputs in the FixedScaleOffset class. The changes include: - Introduced a check to ensure that when a fill_value is provided, the input dtype must be floating-point. - Prevented the use of integer dtypes for fill_value, which cannot encode NaN values. - Updated type and casting validation to ensure that fill_value is correctly cast to the specified astype. - Only support float -> int -> float transformations, as float -> float already natively support NaNs without fill_value - Added tests for fill_value options References: zarr-developers#511
We have a lot of use for this, especially to move away from using attributes "add_offset" and "scale_factor" in xarray, and instead using the zarr encoding directly. I’ve made some preliminary changes to address this issue in my fork. You can check them out in this branch: fixedscaleoffset-nans. I added to the tests, except the test_backwards_compatibility. This appears to create files that do not clean up after each run, which will interfere with the nan/fill_value cases because the codecs get run on all the previous datasets saved. This causes an issue when a scale/offset is used on a "old" dataset that has values that do not make sense to use with the current codec's offset/scale. I am probably overlooking a way to handle this: https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/tests/common.py#L259-L276 # save fixture data
for i, arr in enumerate(arrays):
arr_fn = os.path.join(fixture_dir, f'array.{i:02d}.npy')
if not os.path.exists(arr_fn): # pragma: no cover
np.save(arr_fn, arr)
# load fixture data
for arr_fn in glob(os.path.join(fixture_dir, 'array.*.npy')):
# setup
i = int(arr_fn.split('.')[-2])
arr = np.load(arr_fn, allow_pickle=True)
arr_bytes = arr.tobytes(order='A')
if arr.flags.f_contiguous:
order = 'F'
else:
order = 'C'
for j, codec in enumerate(codecs): Would this be something worth going forward with? Thank you for the help/suggestions. |
This commit addresses issue zarr-developers#511 by adding support for handling NaN inputs in the FixedScaleOffset class. The changes include: - Introduced a check to ensure that when a fill_value is provided, the input dtype must be floating-point. - fill_value must be an integer dtype - Updated type and casting validation to ensure that fill_value is correctly cast to the specified astype (eg. fill_value of 3000 cannot cast to int8) - Only support float -> int -> float transformations, as float -> float already natively support NaNs without fill_value - Added tests for fill_value options References: zarr-developers#511
This commit addresses issue zarr-developers#511 by adding support for handling NaN inputs in the FixedScaleOffset class. The changes include: - Introduced a check to ensure that when a fill_value is provided, the input dtype must be floating-point. - fill_value must be an integer dtype - Updated type and casting validation to ensure that fill_value is correctly cast to the specified astype (eg. fill_value of 3000 cannot cast to int8) - Only support float -> int -> float transformations, as float -> float already natively support NaNs without fill_value - Added tests for fill_value options - Added fixtures for fill_value version of fixedscaleoffset References: zarr-developers#511
This commit addresses issue zarr-developers#511 by adding support for handling NaN inputs in the FixedScaleOffset class. The changes include: - Introduced a check to ensure that when a fill_value is provided, the input dtype must be floating-point. - fill_value must be an integer dtype - Updated type and casting validation to ensure that fill_value is correctly cast to the specified astype (eg. fill_value of 3000 cannot cast to int8) - Only support float -> int -> float transformations, as float -> float already natively support NaNs without fill_value - Added tests for fill_value options - Added fixtures for fill_value version of fixedscaleoffset References: zarr-developers#511
I had some confusion on my previous post. I've added some fixtures for the case where a fill_value is present. Would this be worth a PR, or should this "fill_value" FixedScaleOffset be added as a different filter? |
This commit addresses issue zarr-developers#511 by adding support for handling NaN inputs in the FixedScaleOffset class. The changes include: - Introduced a check to ensure that when a fill_value is provided, the input dtype must be floating-point. - fill_value must be an integer dtype - Updated type and casting validation to ensure that fill_value is correctly cast to the specified astype (eg. fill_value of 3000 cannot cast to int8) - Only support float -> int -> float transformations, as float -> float already natively support NaNs without fill_value - Added tests for fill_value options - Added fixtures for fill_value version of fixedscaleoffset References: zarr-developers#511
Hi,
I was wondering if you all would be interested in a FixedScaleOffset that can handle np.nan inputs? In the style of HDF/netcdf, having a fill value to replace np.nan with an appropriate integer. This could be either user defined or determined automatically based on the astype integer size (assign it the smallest possible integer value).
I can either modify the existing FixedScaleOffset class, or I could create another class. It's a very simple change, though there may be concerns of more memory usage due to boolean masking.
Also, is there any reason why dtype shouldn't always be a float and astype shouldn't always be an integer?
Thanks
The text was updated successfully, but these errors were encountered: