-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jax fixes #323154
Jax fixes #323154
Conversation
da16155
to
b5e7b73
Compare
Deferring to the CUDA team, they are much closer to Jax. |
pkgs/top-level/python-packages.nix
Outdated
@@ -6114,13 +6114,13 @@ self: super: with self; { | |||
IOKit = pkgs.darwin.apple_sdk_11_0.IOKit; | |||
}; | |||
|
|||
jaxlib = self.jaxlib-build; | |||
jaxlib = if self.jaxlib-build.meta.broken then self.jaxlib-bin else self.jaxlib-build; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhh I'm not sure how good of an idea this is: I can see myself tearing my hair out a few steps down the line (steps involving changing meta.broken) trying to figure out how I ended up with an unfree pypi wheel in the closure instead of a cpu-only source-build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't mind, but I get your point. Feel free to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this feels like a footgun.
Another option: we alias jaxlib = jaxlib-bin
and leave a comment linking to this PR and the hash mismatch issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I like samuela's simple approach. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest a more radical approach, which would probably cause frustration to other maintainers: we add a lib.warn
to the if broken then jaxlib-bin
branch saying that we're using a prebuilt and patchelfed jaxlib from pypi, potentially including unfree components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "if broken then jaxlib-bin else jaxlib-build" logic existed in the jax package.
Ohhh, I had completely missed that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there was already an alias in place, and in this PR it place two roles:
- Supporting darwin
- Working around the hash issue
For the former, I'd replace the logic with if jaxlib.meta.unsupported then jaxlib-bin else jaxlib
to look at platforms
instead of broken
. For the latter, I hope we can work around the instability soon.
And if the build version is broken, the packages should keep working.
Yes but whether shipping a substitute "product" counts as "it's still working" is a matter of setting expectations.
Anyway I see now that my comment was out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but whether shipping a substitute "product" counts as "it's still working" is a matter of setting expectations.
Agreed. One difference I glossed over is that the jaxlib-source
also supports different mtl
variants. If the user overrides that package, he could be surprised by the switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "if broken then jaxlib-bin else jaxlib-build" logic existed in the jax package.
Yes, I wrote this in order to fix the build on darwin.
Ok, there was already an alias in place
Actually, there is/was currently no such alias in place. The "if broken then jaxlib-bin else jaxlib-build" was only logic local to the jax package so that it would use jaxlib in checkPhase
, but its impact was limited to that. See
jaxlib' = if jaxlib.meta.broken then jaxlib-bin else jaxlib; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just writting if stdenv.isDarwin then A else B
is way easier to understand and we should probably just go with that.
@@ -130,10 +127,14 @@ buildPythonPackage rec { | |||
"testQdwhWithOnRankDeficientInput5" | |||
]; | |||
|
|||
disabledTestPaths = lib.optionals (stdenv.isDarwin && stdenv.isAarch64) [ | |||
disabledTestPaths = [ | |||
# Segmentation fault |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have an upstream issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. I don't know if this is an upstream or a packaging issue.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the log I'm 95% confident this is an upstream issue.
pkgs/top-level/python-packages.nix
Outdated
@@ -6114,13 +6114,13 @@ self: super: with self; { | |||
IOKit = pkgs.darwin.apple_sdk_11_0.IOKit; | |||
}; | |||
|
|||
jaxlib = self.jaxlib-build; | |||
jaxlib = if self.jaxlib-build.meta.broken then self.jaxlib-bin else self.jaxlib-build; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this feels like a footgun.
Another option: we alias jaxlib = jaxlib-bin
and leave a comment linking to this PR and the hash mismatch issue.
pkgs/top-level/python-packages.nix
Outdated
# Using the -bin version since the -source one is currently broken (see #323154) | ||
jaxlib = self.jaxlib-bin; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I've no preference as to whether this is implemented in jax/default.nix
or in python-packages.nix
. Let's keep this as a source build on linux-*
. Let's keep it at -bin
for darwin-*
but use meta.unsupported
instead of broken
. With a comment that this is a macos exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to this logic
So that packages depending on jaxlib can also build on macOS.
to the currently supported ones, so that the top-level jaxlib can fallback to jaxlib-bin
# aarch64-darwin is broken because of https://github.com/bazelbuild/rules_cc/pull/136 | ||
# however even with that fix applied, it doesn't work for everyone: | ||
# https://github.com/NixOS/nixpkgs/pull/184395#issuecomment-1207287129 | ||
# NOTE: We always build with NCCL; if it is unsupported, then our build is broken. | ||
broken = effectiveStdenv.isDarwin || nccl.meta.unsupported; | ||
platforms = platforms.linux; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to make the new meta.unsupported
logic work. Honestly, I fail to predict what's this going to look like on search.nixos.org now
# aarch64-darwin is broken because of https://github.com/bazelbuild/rules_cc/pull/136 | ||
# however even with that fix applied, it doesn't work for everyone: | ||
# https://github.com/NixOS/nixpkgs/pull/184395#issuecomment-1207287129 | ||
# NOTE: We always build with NCCL; if it is unsupported, then our build is broken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: We always build with NCCL; if it is unsupported, then our build is broken.
This is a peculiar comment, if the derivation depended on nccl and nccl was unsupported for the platform, the evaluation would fail saying exactly that: nccl is not supported on this platform. It also seems like the cpu-only jaxlib didn't actually have nccl in the inputs. Am I reading this right? CC @samuela
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not entirely sure... my read is that cudaSupport -> using nccl
, and the logic that was added just happened to forget about the CPU-only case.
cc @ConnorBaker who wrote this in 9bebd9e
Co-authored-by: Sandro <[email protected]>
Description of changes
Doing a pass to get jax are related libraries to a good place.
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.