-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
postgresql_14: fix build #368091
postgresql_14: fix build #368091
Conversation
@wolfgangwalther postgresql_14 builds for me on master
can you provide more info ? perhaps create an issue ? |
Uh, that's really odd. For me, it can't be found in the binary cache, but for you it seems it can be? We'd have to look up hydra logs to see whether the same failure happened there. In any case, I had the same failure appear in a CI job elsewhere, so it's not only related to my system. This smells like a nix bug. Which version of I have:
|
Also the hash for me is different when building this on master:
@kirillrdy which system are you building on? |
also for me, it wasn't in a cache, i had to build it |
Same for me. I don't understand why my hash is different. |
🙈 ah, it was the |
try flakes
i get same output hash
|
|
lolwat 😅 Does this affect v14 only or other versions as well? Would prefer to spend a little more time to investigate this. I don't think I'll get to much during the holidays though. |
Only v14 :D
I haven't bisected neither nixpkgs nor nix, so I can't tell when it started to appear. I just realized when trying to update PostgREST's CI, which also runs postgresql 14. |
Assuming that it's a nix bug, we currently have this:
It started failing in hydra here: https://hydra.nixos.org/build/281720533 The logs here don't show anything helpful: https://hydra.nixos.org/build/281937414 - essentially they are cut-off right before the error that I get. |
The error is very odd, but rebuilding all postgresql versions on master would be relatively expensive, as labeled by CI. |
Looking at https://hydra.nixos.org/eval/1810750?filter=postgresql\_ – the darwin errors seem different, so I think only postgresql_14.x86_64-linux is affected. I think it would be fine to merge this if conditional, e.g. for version 14 on any platform. Though it would be nice to better understand when/why this happens. |
Yes, absolutely. We'd have to let this go through staging or add a conditional as you say.. but I'd really like to understand what's happening, if only remotely so, first. |
bisecting only lead me to #361878 so far, not anything more precise. |
I have the same build failure with nix 2.24.11 as well (both with and without flakes). No idea why it succeeded to build for @kirillrdy. |
I'm doing a full bisect now. |
@wolfgangwalther have you tried pure build using flakes ? and do you still get same output hash as me ? or different ? |
ok, so now I get same build failure, and I can not reproduce build using same nixos rev and nixpkgs rev 🤷 |
I was able to fix the build with this commit: bendlas@a468e26 also relevant: NixOS/nix#12105 |
It's an automatic merge. I retested again that it really fails on this commit and succeeds on both parents. I don't really feel closer to the real source of the issue, but maybe someone else will have an idea. |
And I can fix the build with the commit in this PR - just changing a bash comment. Thus, I'm not sure whether the removal of references is really necessary here or fixing the actual issue. |
When I revert the commit in #363710, the build succeeds for me. |
To double-check, I built postgresql_14 with the But: When I remove the `lib.disallowedReferences for doc (and keep man), then the build succeeds... So it's really: Anything that changes the derivation in any way... makes the build succeed. Another random example: I changed the order of two configure flags - the build succeeds. |
Sometimes this happened when a sloppy regex (typically in upstream build scripts) matched a part of the /nix/store hash. |
I think I found the issue: if an output is already registered (man output here, not entirely sure why yet), Nix removes the otuput from the list of outputs that need reference checks. That's reasonable, but this exact list is also used to check if an output has references to another output (lib -> man). This also explains why @kirillrdy ran into this issue after a succeeding build I think: the first time the I'll try to come up with a patch for Lix. I guess it should be relatively simple to port to CppNix then, this part doesn't seem to have too many Lix-specific changes. If my theory is correct, this patch is also not helping: it simply works now because |
Yea, I had this issue as well with Nix 2.24.10, and running |
This matches my observation that substituting the man otuput only reliably triggers the issue (given that the other outputs don't seem to be on cache.nixos.org). |
In that case, a proper fix for now would be to add outputChecks for the other outputs, including man, too, right?
Yeah... I guess that's the only way to do it. I assume this bug hasn't been noticed so far, because postgresql is literally the first derivation in nixpkgs to use outputChecks. If we want to go forward with structuredAttrs, we'd need to make sure that all supported nix versions do work properly with outputChecks. |
Building postgresql_14 currently fails with this on master: error: derivation contains an illegal reference specifier 'man' The reason seems to be a bug in nix, where outputChecks are run improperly when one of the outputs can already be substituted. Why the man output can be substituted from hydra is unknown, but adding more outputChecks for the the man and doc outputs should work around the problem until nix is fixed.
96ecb88
to
a0919c7
Compare
I pushed a commit doing that for v14 only. We can merge this to master, and then remove the conditional through staging. |
Apologies for my delayed responses, have quite much to do on multiple ends currently. First of all, for those who are interested, feel free to review https://gerrit.lix.systems/c/lix/+/2346, this should fix the underlying bug.
I'm afraid I don't follow: if we ever get into the situation that only a part of the postgresql store-paths are cached on cache.nixos.org, the list of outputs to check would be incomplete again because the current implementation only adds the output-paths to it that are not registered already in your local store. I may be missing something, but I'm not sure if we can do much here, actually. Don't get me wrong, I'd be very happy to be proven wrong here! |
I see - that was probably a mis-understanding on my side. I had hoped/assumed that because outputChecks for those outputs would need to be run, they would need to be added to the list as well. (I have not looked at your fix, yet) But what you're saying implies that those outputChecks had already been run before and are not run again for substituted outputs - which makes sense. So yeah, the "fix" merged here doesn't seem to be better than any other changes we could have made to the derivation. In any case, those additional outputChecks are not hurting, so I think we can keep them / proceed with them on staging.
Yeah, there seems to be no other way. |
The thing is that I believe it's not rare to have a year old nix (or even older). So I'd say we're mainly "lucky" that no really important package seems affected so far. |
No, this bug can only surface when a derivation uses structuredAttrs and outputChecks. postgresql is the only one in nixpkgs doing that, so far. So this happened actually fairly quickly: We introduced the structuredAttrs+outputChecks in August of last year and a few months later we hit this bug - with only a single package using those. But if we really enable structuredAttrs by default eventually, this needs to be fixed in all supported nix versions. |
Nixpkgs issues / PRs: * NixOS/nixpkgs#368091 * NixOS/nixpkgs#369366 This can be triggered with the postgresql_14 derivation from nixpkgs rev 19305d94dacca226ca048b78e6de00f599c65858 (/nix/store/bxp6g57limvwiga61vdlyvhy7i8rp6wd-postgresql-14.15.drv on x86_64-linux): for reasons unknown to me, only the `man` and `lib` outputs are cached on cache.nixos.org: $ nix derivation show /nix/store/bxp6g57limvwiga61vdlyvhy7i8rp6wd-postgresql-14.15.drv | jq '.[].outputs.[].path' -r | xargs nix path-info --store https://cache.nixos.org warning: The interpretation of store paths arguments ending in `.drv` recently changed. If this command is now failing try again with '/nix/store/bxp6g57limvwiga61vdlyvhy7i8rp6wd-postgresql-14.15.drv^*' don't know how to build these paths: /nix/store/m9vb40xxr6gckjzpfxnqcmjqsks2gx03-postgresql-14.15 /nix/store/nm1415wa53iawar9axwxy0an6ximhayn-postgresql-14.15-dev /nix/store/v9vrvfhiw9gk8hj9895sb15fxvxnyylj-postgresql-14.15-debug /nix/store/zi12g1p99g2173i8093ixbqkfh9ng87b-postgresql-14.15-doc /nix/store/3i3fpz0xss9inampf51gp3pkx24ypxpj-postgresql-14.15-man /nix/store/db8797h2cp4rm1cnsqrf87apkkxwwdff-postgresql-14.15-lib error: path '/nix/store/m9vb40xxr6gckjzpfxnqcmjqsks2gx03-postgresql-14.15' does not exist in the store Also, the derivation uses the `outputChecks` feature (and thus `__structuredAttrs`) to make sure that e.g. the `out` output doesn't reference the `man` output: __structuredAttrs = true; outputs = [ "out" "dev" "doc" "lib" "man" ]; outputChecks.out.disallowedReferences = [ "dev" "doc" "man" ]; With all that in place, the following error was hit on all CppNix / Lix versions currently supported when trying to build the derivation above: error: derivation contains an illegal reference specifier 'man' The following happened here: * The `man` & `lib` outputs were substituted at some point. * When register outputs, the reference checks are made. * `LocalDerivationGoal::checkOutputs` gets a map of all outputs that were built and are NOT already registered in the store. In the example above this means `out`, `dev`, `debug` and `doc`. * `checkOutputs` tries to resolve the `man` output and fails to do so because it's a store-path that's already registered and thus not part of the map passed to `checkOutputs`. Since the map passed to `checkOutputs` is used in various other places that appear to assume that the paths aren't registered already, I didn't write the already registered paths into it. Instead, I created a second map that contains all already registered outputs and pass it as third argument to `checkOutputs`. If the other lookups fail, this map will be now checked before the "illegal reference specifier"-error is thrown. This fixes the problem with `postgresql_14` for me. Also wrote a small regression test that fails locally without the patch in place. Change-Id: Ieacca80c001fcfbebf6f5fe97e25c49d2724c3ff
Building postgresql_14 currently fails with this on master:
The odd thing is, that changing anything in the postInstall phase seems to fix this, including just adding a single character as a comment.
I have no clue what's happening here.
I don't have too much time the next couple of days to investigate this.
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.