-
-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use binary cache at store_uri
for certain operations
#1360
base: nix-next
Are you sure you want to change the base?
Conversation
Replaces / Closes NixOS#1352 Consider the following setup: * `store_uri` points to a "special" store, e.g. `s3://` or `file://`. * Hydra runs inside e.g. an nspawn container and exclusively builds on remote systems. Then, build products are never in the nspawn's `/nix/store`. This in turn means that `$MACHINE_LOCAL_STORE->isValidPath` is always `false` (or `0` 🤷) for output paths and thus Hydra wrongly claims that the output got garbage collected. Instead, use the BINARY_CACHE_STORE to look for the availability in the correct store. Everything touching the `drv` rather than the output paths still uses MACHINE_LOCAL_STORE: this is because `hydra-eval-jobs` does `openStore()` on `auto`, so the derivations end up there rather than on the BINARY_CACHE_STORE.
Now that we don't assume the local store anymore, but use the BINARY_CACHE_STORE which is effectively `openStore(store_uri)`, it doesn't really matter where the store is. Also, this doesn't seem too expensive for at most 5 build outputs.
Otherwise the `include`-test would fail: this test defines a `hydra.conf` with an `include` statement. However, the file to be included is created after the test base is set up. This means that - among others - the `Hydra::Helper::Nix` module is loaded before the file to include exists. Since the global variables are directly initialized and try to read from the config, the test fails, because loading the config at that time doesn't work. Delaying this into a subrouting solves the issue. I decided against messing around with the config mechanism here since I consider this behavior to be correct. The local store gets (i.e. `Hydra::Store->new()` w/o args) cached in the Perl bindings already[1], however this isn't the case for the binary cache store. Because of that, a `state` variable is used to cache the binary cache store. At runtime, the location of that store doesn't change anyways. [1] https://github.com/NixOS/nix/blob/e3ccb7e42a55529d8d50544e7e0cebb5cbc606cf/perl/lib/Nix/Store.xs#L88-L96
1d5b3d5
to
67de7a3
Compare
I'm still sorta confused how it ever worked before / are the performance ramifications of looking up in the binary cache when we previously looked up in the local store (presumably sometimes successfully?) OK. |
Given how the Perl bindings used to look like, my best guess is that Perhaps @edolstra knows a little more?
It should be OK for Perhaps we want to leave that part out for now or split it up once more even? |
@Ma27 Right. Yeah the fact that Hydra presumably supported binary caches long before Nix had the store abstraction that included binary caches, is what makes things confusing --- why does it seems like the old way of working with the binary cache is wrong even though we've had it for quite a while? |
This method is applicable on all kinds of stores, so let's give it a store rather than letting it decide which store to use.
Yeah, I tried to use that when I first set up a Hydra and I think I never managed to set up binary caching back then. Considering that I wouldn't expect local stores to be much of an impact, but you know the implementations better than me. Also, the most used thing is probably the build overview which is basically a few OTOH I'm not sure doing that for remote stores is sensible because of potential latency and because S3 is pay by requester now: NixOS/infra#299 So perhaps we want to exclude 44c98e1 from this PR @Ericson2314 ? |
Pay by requester shouldn't be relevant here - I doubt anyone other than us is directly using s3://nix-cache for their Hydra instance, and when the requester is us (NixOS Hydra) we'd be paying either way even if it wasn't pay by requester. Plus, the operations here are all super cheap, it's bandwidth that's the expensive part. Latency could indeed be a problem: this is switching from an operation that completes in milliseconds (checking the existence of a local path) to an operation that completes in hundreds of milliseconds (cross-atlantic roundtrip), and there's no parallelism or batching anywhere to try and alleviate this. Of course, I'm not saying that it is a problem - I don't know nearly enough about the Hydra codebase to even understand what's being changed here. But if we were before checking 100+ paths in the local store, and we switch to checking 100+ paths in a remote store, we're going from something near-instant (sub-second) to something that needs roughly a minute to run. |
Yep, agreed about the latency part.
These are the changes that I'd consider good to go. While revisiting |
Regarding the latency, I think this should be fine because a lot of the queries are going to be repeats and the narinfo cache prevents repeated checks? Not 100% sure but I'm in favour of merging it without the split and seeing how it impacts performance. |
Fine by me, but that call should be made by the operators of hno |
Based on #1359. Trivial now that we have Perl bindings that accept a store URI.
This changes the following operations to not use the local store, but the binary cache store instead:
Also switched from a global variable to
sub
s to keep the include test working that needs to read config (for thestore_uri
) before the files to be included are created.Closes #1352 as it solves the same issue.
cc @Ericson2314