Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FixedScaleOffset cannot be combined with BitRound to reduce quantization errors as expected #549

Open
point9repeating opened this issue Jul 12, 2024 · 0 comments

Comments

@point9repeating
Copy link

import numpy as np
import numcodecs

filter = numcodecs.fixedscaleoffset.FixedScaleOffset(273.15, 1.0, 'f4', astype='f4')
# temp in K
temperature = np.array([263.05, 273.05, 273.35, 283.25, 293.55, 304.05, 313.94998], dtype=np.float32)
# scale temperature in degrees K to degrees C
filter.encode(temperature)
Out[110]: array([-10.,  -0.,   0.,  10.,  20.,  31.,  41.], dtype=float32)

Problem description

I'm looking at implementing lossy compression with the BitRound filter for some large weather datasets stored in zarr. Some parameters are stored with units that put all values in a range that can be fairly large in magnitude (e.g. not in the range of [2^0, 2^1]. One example is temperature in Kelvin. The quantization errors after applying BitRound are larger than they need to be in such cases.

If I could offset the data to a more reasonable range, I could achieve smaller quantization errors. It looked like FixedScaleOffset would be just the ticket after I saw that it accepts an astype argument. Unfortunately, FixedScaleOffset always rounds the data to integers before casting to that type. I tested a local implementation of FixedScaleOffset and found that removing this rounding achieved the desired behavior.

I would like to chain the FixedScaleOffset and Bitround filters in a way that could minimize quantization errors. In one local test, I used bit rounding with keepbits=8 for a temperature array. The maximum quanitzation errors were +/-0.5 degrees. Using FixedScaleOffset without integer rounding, these errors were reduced to +/-0.0625 degrees.

Potential enhancement

We could check that the astype argument is an integer dtype. If it is, we apply rounding. Otherwise, we leave the data alone.

Or, we could add an optional argument to FixedScaleOffset that controls whether or not rounding to integers is applied and default that to True for backwards compatibility.

point9repeating pushed a commit to point9repeating/numcodecs that referenced this issue Aug 9, 2024
…ing this filter without automatically converting to integers. (zarr-developers#549)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant