Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use System lz4, zstd, and blosc #569

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

BwL1289
Copy link

@BwL1289 BwL1289 commented Aug 27, 2024

This PR adds the capability to build Python bindings against system-provided blosc, zstd, lz4, and blosc linked zlib-ng instead of the vendored libraries included in numcodecs.

This PR addresses #464, #264, and by extension the comment in #314.

@BwL1289
Copy link
Author

BwL1289 commented Aug 30, 2024

@martindurant bumping this if it's not too inconvenient for you. Let me know how I can help support to get it merged.

@martindurant
Copy link
Member

I may not be the one you want to review this one. Actually I don't understand: does this build wheels which can then be used without the given system libraries?

Previously, I have been trying to argue, yes, that vendoring is unnecessary, but because we can get the same libraries via conda or wheels (e.g., cramjam, which now supports blosc2 also, all in static libs; here is a discussion on zlib-ng).

@BwL1289
Copy link
Author

BwL1289 commented Aug 30, 2024

This allows to build numcodecs against already installed system libraries instead of the vendored ones.

Using vendored libs when you're compiling & using those libraries yourself in other places may lead to strange and hard to debug bugs due to binary incompatibilities as not only API and ABI have to be the same in such cases and objects must be fully identical in RAM which includes hidden fields (which are affected by defines).

Looping in @joshmoore if that's OK.

Thanks so much.

@martindurant
Copy link
Member

Using vendored libs when you're compiling ...

So using the statically linked cramjam or similar is a solution, no?

I still don't understand whether the proposed workflow here can be used for distribution. The comment above suggests "no" to me.

@martindurant
Copy link
Member

By "statically inked", I mean this is what cramjam does internally, so for numcodecs it would be a simple dependency, no compile step at all.

@BwL1289
Copy link
Author

BwL1289 commented Aug 30, 2024

@martindurant do you know when numcodecs will switch to cramjam? Until then, I think this PR can help fill the gap for people needing an interim solution.

It may not be good for distribution, but it's necessary for people who are building numcodecs, its dependencies, and libraries in general from sources. This PR doesn't remove the ability to build with vendored dependency sources, so it's purely additive.

@martindurant
Copy link
Member

I don't know, and I can't say for sure it will happen - I just think it should. This is further complicate by the changes required by zarr3.

@joshmoore
Copy link
Member

Thanks, @BwL1289. 👍 for having this as a new flag that leaves everything untouched unless someone invokes it. The only possible downside I can imagine is the (slight) increase in complexity of the build logic, but it looks generally well done. From my side, no objections. Pinging @jakirkham and @dstansby for any potential concerns.

(This should also not dissuade anyone from picking up the cramjam flag, of course 😄)

@BwL1289
Copy link
Author

BwL1289 commented Aug 31, 2024

Thank you @joshmoore, much appreciated.

And agreed re: cramjam.

I think the slight increase in build complexity is a worthwhile tradeoff in the interim for folks who need the functionality.

Happy to make improvements wherever possible.

@dstansby
Copy link
Contributor

Could you expand a bit on what the use case is of this? I think all the use cases described in #464 would be fixed by depending on Python packages such that we don't need to build the libraries ourselves.

On a project level I'm reluctant about this because it removes the tight coupling between numcodecs versions and compressor library versions, and opens the possibility of numcodecs binaries existing in the wild that have arbiratry versions of compression libraries included, which then makes it harder for us to debug what loreoks like a 'standard' numcodecs package.

On a practical level I'm reluctant about this because it is untested, and adds yet more custom code to setup.py to maintain (and we are short on maintenance time!)... the second worry would be much less of a worry if this was tested, so I think a condition to merging this should be having a test run in GitHub actions that checks the new code in setup.py works.

FWIW I've also been playing around with un-bundling sources, starting with blosc: #569. I've just opened that as a proof of concept draft, but if there's demand for that work to continue then I'm happy to push it forward (I guess using cranjam + blosc as the two external libraries numcodecs depends on?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants