Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jax fixes #323154

Merged
merged 5 commits into from
Jul 4, 2024
Merged

Jax fixes #323154

merged 5 commits into from
Jul 4, 2024

Conversation

zimbatm
Copy link
Member

@zimbatm zimbatm commented Jun 28, 2024

Description of changes

Doing a pass to get jax are related libraries to a good place.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@zimbatm zimbatm force-pushed the jax-fixes branch 2 times, most recently from da16155 to b5e7b73 Compare June 28, 2024 15:18
@ofborg ofborg bot added 8.has: clean-up 8.has: package (new) This PR adds a new package labels Jun 28, 2024
@ofborg ofborg bot requested a review from samuela June 28, 2024 15:49
@zimbatm zimbatm marked this pull request as ready for review June 28, 2024 16:21
@zimbatm zimbatm requested review from mweinelt and GaetanLepage June 28, 2024 16:51
@mweinelt
Copy link
Member

Deferring to the CUDA team, they are much closer to Jax.

@mweinelt mweinelt requested review from SomeoneSerge and removed request for mweinelt June 28, 2024 16:58
@zimbatm zimbatm requested a review from a team June 28, 2024 22:41
@@ -6114,13 +6114,13 @@ self: super: with self; {
IOKit = pkgs.darwin.apple_sdk_11_0.IOKit;
};

jaxlib = self.jaxlib-build;
jaxlib = if self.jaxlib-build.meta.broken then self.jaxlib-bin else self.jaxlib-build;
Copy link
Contributor

@SomeoneSerge SomeoneSerge Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh I'm not sure how good of an idea this is: I can see myself tearing my hair out a few steps down the line (steps involving changing meta.broken) trying to figure out how I ended up with an unfree pypi wheel in the closure instead of a cpu-only source-build

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't mind, but I get your point. Feel free to decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this feels like a footgun.

Another option: we alias jaxlib = jaxlib-bin and leave a comment linking to this PR and the hash mismatch issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I like samuela's simple approach. Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest a more radical approach, which would probably cause frustration to other maintainers: we add a lib.warn to the if broken then jaxlib-bin branch saying that we're using a prebuilt and patchelfed jaxlib from pypi, potentially including unfree components

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "if broken then jaxlib-bin else jaxlib-build" logic existed in the jax package.

Ohhh, I had completely missed that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, there was already an alias in place, and in this PR it place two roles:

  • Supporting darwin
  • Working around the hash issue

For the former, I'd replace the logic with if jaxlib.meta.unsupported then jaxlib-bin else jaxlib to look at platforms instead of broken. For the latter, I hope we can work around the instability soon.

And if the build version is broken, the packages should keep working.

Yes but whether shipping a substitute "product" counts as "it's still working" is a matter of setting expectations.

Anyway I see now that my comment was out of scope

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but whether shipping a substitute "product" counts as "it's still working" is a matter of setting expectations.

Agreed. One difference I glossed over is that the jaxlib-source also supports different mtl variants. If the user overrides that package, he could be surprised by the switch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "if broken then jaxlib-bin else jaxlib-build" logic existed in the jax package.

Yes, I wrote this in order to fix the build on darwin.

Ok, there was already an alias in place

Actually, there is/was currently no such alias in place. The "if broken then jaxlib-bin else jaxlib-build" was only logic local to the jax package so that it would use jaxlib in checkPhase, but its impact was limited to that. See

jaxlib' = if jaxlib.meta.broken then jaxlib-bin else jaxlib;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just writting if stdenv.isDarwin then A else B is way easier to understand and we should probably just go with that.

@@ -130,10 +127,14 @@ buildPythonPackage rec {
"testQdwhWithOnRankDeficientInput5"
];

disabledTestPaths = lib.optionals (stdenv.isDarwin && stdenv.isAarch64) [
disabledTestPaths = [
# Segmentation fault
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have an upstream issue for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. I don't know if this is an upstream or a packaging issue.

This comment was marked as outdated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the log I'm 95% confident this is an upstream issue.

pkgs/development/python-modules/jaxlib/default.nix Outdated Show resolved Hide resolved
@@ -6114,13 +6114,13 @@ self: super: with self; {
IOKit = pkgs.darwin.apple_sdk_11_0.IOKit;
};

jaxlib = self.jaxlib-build;
jaxlib = if self.jaxlib-build.meta.broken then self.jaxlib-bin else self.jaxlib-build;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this feels like a footgun.

Another option: we alias jaxlib = jaxlib-bin and leave a comment linking to this PR and the hash mismatch issue.

@wegank wegank added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Jun 30, 2024
@ofborg ofborg bot requested a review from samuela June 30, 2024 19:01
@wegank wegank removed the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Jun 30, 2024
Comment on lines 6117 to 6118
# Using the -bin version since the -source one is currently broken (see #323154)
jaxlib = self.jaxlib-bin;
Copy link
Contributor

@SomeoneSerge SomeoneSerge Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I've no preference as to whether this is implemented in jax/default.nix or in python-packages.nix. Let's keep this as a source build on linux-*. Let's keep it at -bin for darwin-* but use meta.unsupported instead of broken. With a comment that this is a macos exception

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to this logic

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin and removed 10.rebuild-darwin: 11-100 labels Jul 3, 2024
to the currently supported ones, so that the top-level jaxlib can fallback to jaxlib-bin
@SomeoneSerge SomeoneSerge requested a review from natsukium as a code owner July 3, 2024 14:49
# aarch64-darwin is broken because of https://github.com/bazelbuild/rules_cc/pull/136
# however even with that fix applied, it doesn't work for everyone:
# https://github.com/NixOS/nixpkgs/pull/184395#issuecomment-1207287129
# NOTE: We always build with NCCL; if it is unsupported, then our build is broken.
broken = effectiveStdenv.isDarwin || nccl.meta.unsupported;
platforms = platforms.linux;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make the new meta.unsupported logic work. Honestly, I fail to predict what's this going to look like on search.nixos.org now

# aarch64-darwin is broken because of https://github.com/bazelbuild/rules_cc/pull/136
# however even with that fix applied, it doesn't work for everyone:
# https://github.com/NixOS/nixpkgs/pull/184395#issuecomment-1207287129
# NOTE: We always build with NCCL; if it is unsupported, then our build is broken.
Copy link
Contributor

@SomeoneSerge SomeoneSerge Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: We always build with NCCL; if it is unsupported, then our build is broken.

This is a peculiar comment, if the derivation depended on nccl and nccl was unsupported for the platform, the evaluation would fail saying exactly that: nccl is not supported on this platform. It also seems like the cpu-only jaxlib didn't actually have nccl in the inputs. Am I reading this right? CC @samuela

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not entirely sure... my read is that cudaSupport -> using nccl, and the logic that was added just happened to forget about the CPU-only case.

cc @ConnorBaker who wrote this in 9bebd9e

@ofborg ofborg bot added 10.rebuild-darwin: 11-100 and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin labels Jul 3, 2024
@zimbatm zimbatm merged commit 9f38e1c into NixOS:master Jul 4, 2024
26 checks passed
@zimbatm zimbatm deleted the jax-fixes branch July 4, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants