-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: incremental zstd mode #387
Conversation
This should greatly decrease the sizes of workspace artifacts that crane produces. See ipetkov/crane#387
I can't just let @j-baker win with #386 after all that arguing. ;) Turns out it was surprisingly easy, and in my local tests looks like it shrank the second state target archive from 12GB to 6GB (which then gets compressed to 2.08GB) Here is the PR using for my project where I care about it most ("after"): fedimint/fedimint#3219 Here is the reference build ("before"): https://github.com/fedimint/fedimint/actions/runs/6212380934/job/16862451162?pr=3212 |
This should greatly decrease the sizes of workspace artifacts that crane produces. See ipetkov/crane#387
This should greatly decrease the sizes of workspace artifacts that crane produces. See ipetkov/crane#387
This should greatly decrease the sizes of workspace artifacts that crane produces. See ipetkov/crane#387
This should greatly decrease the sizes of workspace artifacts that crane produces. See ipetkov/crane#387
This should greatly decrease the sizes of workspace artifacts that crane produces. See ipetkov/crane#387
All right. There was one bug (must use The timestamps are mixed, as I had to re-run, and for some there was no point in re-copy-pasting. Beforedeps only, compression:
workspace, decompression:
workspace, compression:
tests, decompression:
Afterdeps only, compression:
workspace, decompression:
workspace, compression:
tests, decompression:
Summary
As far as I can tell it is working as expected. |
This should greatly decrease the sizes of workspace artifacts that crane produces. See ipetkov/crane#387
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for implementing this! The solution turned out way simpler than I had feared 🎉
Have a few minor clean up comments (plus we need to update the changelog), but happy to merge when they are addressed!
compressAndInstallCargoArtifactsDirIncremental "${dir}" "${cargoTargetDir}" | ||
;; | ||
|
||
"use-zstd-no-incr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate your attention to allowing one to opt into the previous behavior!
I'd okay to call out the new behavior (in the change log) and add a new environment variable (e.g. doNotCompressIncrementally
) to installCargoArtifactsHook.sh
which would give the old behavior back!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call it useFullCargoArtifactsSnapshot
. I think in certain cases it might be useful to use it even when there are no problems with the incremental approach (just to shed the chain of incremental archives, etc.)
However I'm confused how to propagate it to this function. Can I just use a global variable?
BTW. When I starting documentation changes, it did occurred to me again that I think it's better to just use different mode
string for it.
When it's a separate flag, than it's unclear when it's used, when ignored and it becomes hard to understand. When it is its own mode, then there's no problem like this, and it's clear then it's a slightly different mode.
I propose:
use-zstd-full
use-zstd-incremental
oruse-zstd-diff
- define the
use-zstd
and alias to "whatever we think is the best default", and make ituse-zstd-incremental
right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
installCargoArtifactsMode
allows us to only specify one thing, meaning if we want to subtly tweak different installation strategies we'd have to keep tacking on use-zstd-flag-1
, use-zstd-flag-2
, use-zstd-flag-1-and-flag-2
which could get unweildy IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not going to support that many variations, IMO. But if it's neccessary it could be
installCargoArtifactsMode = { mode = "use-zstd"; diff = true; };
which crane can easily detect and handle, I think, with full freedom of parametrizing.
rm -f "${cargoTargetDir}/.crane-previous-archive" | ||
ln -s "${preparedArtifacts}" "${cargoTargetDir}/.crane-previous-archive" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works fine, though technically we're wasting a few operations waiting for the file system to delete/create the new marker file. Maybe we could store this in a (global) variable to be a bit cheaper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know (still don't) how to do that. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can land as is and then follow-up with improvement?
BTW. I'm totally fine if you want to just redo/tweak and merge this PR as you please. For final touches it is generally faster than going back and forth. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I can take it to the finish line sometime this week!
This should greatly decrease the sizes of workspace artifacts that crane produces. See ipetkov/crane#387
After merging it in our CI pipeline, I've noticed MacOS builds are failing:
and then missing files that should have been extracted from deps only build. Weird. Will try to rebase and debug. |
Ah, damn it. Unfotunately on MacOS that uses
So the generated file copied the mtime from the existing one, and gets skipped. Bummer. Didn't take long to hit a corner case. Another reason why |
Yep - that's (among other reasons) why I didn't like the look of anything which relies on FS metadata - too easy to mess it up accidentally, especially when cross platform. Agree that it'll mostly work, though. It's worth noting that rust buildscripts can do anything, I'm sure you're aware of the cmake crate. Openssl literally compiles ssl in its build... inotify might be another viable approach? subscribe to the dir pre-build, watch for events and stash them for later? |
I think it's OK as is for now. Thanks to using a full mode in deps only step, the worst case (build.rs scripts of 3rd party tools) are virtually never going to get affected by this. And it's always possible to revert to It would be possible make a |
I think for now we can force Darwin builds to do a non-incremental compression until rust-lang/rust#115982 is resolved. We can also follow up with other optimizations later! (I think there's value in landing this in the mean time) |
I'm going to close this since #398 has landed (though feel free to open a new PR if there's anything else worth changing). Thanks again for the feedback and initial direction here! |
Motivation
Fix #76 (implement it).
This changes the
use-zstd
to create a chain oftarget.tar.zst
files, each time capturing only newly created/modified files, usingmtime
, making breaking multiplecargo
invocations into a chain of separate derivations much cheaper (virtually free in terms of disk space, extraction cost proportional to the size of whole./target
, and compression cost relative to the size of only modified/new files).Previous behavior (taking full snapshots of
./target
each time) is retained for time being under a new name (use-zstd-no-incr
), as there's no guarantee that no one ever will hit a case where something modifies a file without changing mtime (though it seems unlikely). Supporting both is very easy anyway.Edit: Only later it occurred to me that incremental approach does not correct delete files (it will keep re-storing them). It could be handled, but seems unnecessary and it will have a perf cost. Current behavior is most probably OK in practice for 99.9% cases. AFAIR science goes, there are no credible reports of
./target
directory ever shrinking in the nature. But an additional reason to keep the old method around.Possibly there should be:
use-zstd-incremental
use-zstd-full
use-zstd
(alias foruse-zstd-incremental
, or any method deemed even better in the future).This way if anyone has a step that actually delete things, they can use
use-zstd-full
as a one-off. It might be also useful if someone really wants to introduce a break in the chain (e.g. they have really, really long chain and it seems worth it).Checklist
docs/API.md
(or general documentation) with changesCHANGELOG.md