Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: incremental zstd mode #387

Closed
wants to merge 5 commits into from
Closed

Conversation

dpc
Copy link
Contributor

@dpc dpc commented Sep 17, 2023

Motivation

Fix #76 (implement it).

This changes the use-zstd to create a chain of target.tar.zst files, each time capturing only newly created/modified files, using mtime, making breaking multiple cargo invocations into a chain of separate derivations much cheaper (virtually free in terms of disk space, extraction cost proportional to the size of whole ./target, and compression cost relative to the size of only modified/new files).

Previous behavior (taking full snapshots of ./target each time) is retained for time being under a new name (use-zstd-no-incr), as there's no guarantee that no one ever will hit a case where something modifies a file without changing mtime (though it seems unlikely). Supporting both is very easy anyway.

Edit: Only later it occurred to me that incremental approach does not correct delete files (it will keep re-storing them). It could be handled, but seems unnecessary and it will have a perf cost. Current behavior is most probably OK in practice for 99.9% cases. AFAIR science goes, there are no credible reports of ./target directory ever shrinking in the nature. But an additional reason to keep the old method around.

Possibly there should be:

  • use-zstd-incremental
  • use-zstd-full
  • use-zstd (alias for use-zstd-incremental, or any method deemed even better in the future).

This way if anyone has a step that actually delete things, they can use use-zstd-full as a one-off. It might be also useful if someone really wants to introduce a break in the chain (e.g. they have really, really long chain and it seems worth it).

Checklist

  • added tests to verify new behavior
  • added an example template or updated an existing one
  • updated docs/API.md (or general documentation) with changes
  • updated CHANGELOG.md

dpc added a commit to dpc/fedimint that referenced this pull request Sep 17, 2023
This should greatly decrease the sizes of workspace
artifacts that crane produces.

See ipetkov/crane#387
@dpc
Copy link
Contributor Author

dpc commented Sep 17, 2023

I can't just let @j-baker win with #386 after all that arguing. ;)

Turns out it was surprisingly easy, and in my local tests looks like it shrank the second state target archive from 12GB to 6GB (which then gets compressed to 2.08GB)

Here is the PR using for my project where I care about it most ("after"): fedimint/fedimint#3219

Here is the reference build ("before"): https://github.com/fedimint/fedimint/actions/runs/6212380934/job/16862451162?pr=3212

dpc added a commit to dpc/fedimint that referenced this pull request Sep 17, 2023
This should greatly decrease the sizes of workspace
artifacts that crane produces.

See ipetkov/crane#387
dpc added a commit to dpc/fedimint that referenced this pull request Sep 17, 2023
This should greatly decrease the sizes of workspace
artifacts that crane produces.

See ipetkov/crane#387
dpc added a commit to dpc/fedimint that referenced this pull request Sep 17, 2023
This should greatly decrease the sizes of workspace
artifacts that crane produces.

See ipetkov/crane#387
dpc added a commit to dpc/fedimint that referenced this pull request Sep 17, 2023
This should greatly decrease the sizes of workspace
artifacts that crane produces.

See ipetkov/crane#387
dpc added a commit to dpc/fedimint that referenced this pull request Sep 17, 2023
This should greatly decrease the sizes of workspace
artifacts that crane produces.

See ipetkov/crane#387
@dpc
Copy link
Contributor Author

dpc commented Sep 17, 2023

All right. There was one bug (must use --no-recursion). Also the 12GB I mentioned before was largely caused by the #370 , so the new baseline seems to be 6GB compressed to 2GB (yay!).

The timestamps are mixed, as I had to re-run, and for some there was no point in re-copy-pasting.

Before

deps only, compression:

Sun, 17 Sep 2023 07:56:12 GMT fedimint-deps> compressing target to /nix/store/bpma92pg41jbk5h0a9w1fp5p4haxz1jg-fedimint-deps-0.0.1/target.tar.zst
Sun, 17 Sep 2023 07:56:16 GMT fedimint-deps> /*stdin*\            : 22.28%   (  3.59 GiB =>    818 MiB, /nix/store/bpma92pg41jbk5h0a9w1fp5p4haxz1jg-fedimint-deps-0.0.1/target.tar.zst)

workspace, decompression:

Sun, 17 Sep 2023 07:28:04 GMT fedimint-workspace-build> decompressing cargo artifacts from /nix/store/bpma92pg41jbk5h0a9w1fp5p4haxz1jg-fedimint-deps-0.0.1/target.tar.zst to target
Sun, 17 Sep 2023 07:28:08 GMT fedimint-workspace-build> configuring

workspace, compression:

Sun, 17 Sep 2023 07:57:11 GMT fedimint-workspace-build> compressing target to /nix/store/0kh9szbry5zhc3fvvlsh4scgs0p43cwp-fedimint-workspace-build-0.0.1/target.tar.zst
Sun, 17 Sep 2023 07:57:18 GMT fedimint-workspace-build> /*stdin*\            : 30.91%   (  6.33 GiB =>   1.96 GiB, /nix/store/0kh9szbry5zhc3fvvlsh4scgs0p43cwp-fedimint-workspace-build-0.0.1/target.tar.zst)

tests, decompression:

Sun, 17 Sep 2023 07:31:53 GMT fedimint-test-all> decompressing cargo artifacts from /nix/store/0kh9szbry5zhc3fvvlsh4scgs0p43cwp-fedimint-workspace-build-0.0.1/target.tar.zst to target
Sun, 17 Sep 2023 07:32:00 GMT fedimint-test-all> configuring

After

deps only, compression:

Sun, 17 Sep 2023 07:32:30 GMT fedimint-deps> compressing target to /nix/store/xvqh9q9sadqzqhc4150hxkc3q9v8rlhl-fedimint-deps-0.0.1/target.tar.zst
Sun, 17 Sep 2023 07:32:34 GMT fedimint-deps> /*stdin*\            : 21.54%   (  3.74 GiB =>    825 MiB, /nix/store/xvqh9q9sadqzqhc4150hxkc3q9v8rlhl-fedimint-deps-0.0.1/target.tar.zst)

workspace, decompression:

Sun, 17 Sep 2023 07:32:37 GMT fedimint-workspace-build> decompressing cargo artifacts from /nix/store/xvqh9q9sadqzqhc4150hxkc3q9v8rlhl-fedimint-deps-0.0.1/target.tar.zst to target
Sun, 17 Sep 2023 07:32:41 GMT fedimint-workspace-build> configuring

workspace, compression:

Sun, 17 Sep 2023 07:51:03 GMT fedimint-workspace-build> compressing target to /nix/store/by7kqaq2jip4h2687hzjhylfdgldmj9f-fedimint-workspace-build-0.0.1/target.tar.zst
Sun, 17 Sep 2023 07:51:06 GMT fedimint-workspace-build> /*stdin*\            : 42.13%   (  2.75 GiB =>   1.16 GiB, /nix/store/by7kqaq2jip4h2687hzjhylfdgldmj9f-fedimint-workspace-build-0.0.1/target.tar.zst)

tests, decompression:

Sun, 17 Sep 2023 07:52:59 GMT fedimint-test-all> decompressing cargo artifacts from /nix/store/slislj5ja57zb9jsd7vwqkf7dyh58vxn-fedimint-deps-0.0.1/target.tar.zst to target
Sun, 17 Sep 2023 07:53:02 GMT fedimint-test-all> decompressing cargo artifacts from /nix/store/by7kqaq2jip4h2687hzjhylfdgldmj9f-fedimint-workspace-build-0.0.1/target.tar.zst to target
Sun, 17 Sep 2023 07:53:05 GMT fedimint-test-all> configuring

Summary

                 before               vs   after
               size       ;   c;   d             size    ;   c;  d
deps-only: 3.59G =>  818 M;  4s;  4s  vs   3.74G =>  818M;  4s; 4s
workspace: 6.33G => 1.96 G;  7s;  7s  vs   2.75G => 1.16G;  3s; 6s

As far as I can tell it is working as expected.

dpc added a commit to dpc/fedimint that referenced this pull request Sep 17, 2023
This should greatly decrease the sizes of workspace
artifacts that crane produces.

See ipetkov/crane#387
Copy link
Owner

@ipetkov ipetkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for implementing this! The solution turned out way simpler than I had feared 🎉

Have a few minor clean up comments (plus we need to update the changelog), but happy to merge when they are addressed!

compressAndInstallCargoArtifactsDirIncremental "${dir}" "${cargoTargetDir}"
;;

"use-zstd-no-incr")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your attention to allowing one to opt into the previous behavior!

I'd okay to call out the new behavior (in the change log) and add a new environment variable (e.g. doNotCompressIncrementally) to installCargoArtifactsHook.sh which would give the old behavior back!

Copy link
Contributor Author

@dpc dpc Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call it useFullCargoArtifactsSnapshot. I think in certain cases it might be useful to use it even when there are no problems with the incremental approach (just to shed the chain of incremental archives, etc.)

However I'm confused how to propagate it to this function. Can I just use a global variable?

BTW. When I starting documentation changes, it did occurred to me again that I think it's better to just use different mode string for it.

When it's a separate flag, than it's unclear when it's used, when ignored and it becomes hard to understand. When it is its own mode, then there's no problem like this, and it's clear then it's a slightly different mode.

I propose:

  • use-zstd-full
  • use-zstd-incremental or use-zstd-diff
  • define the use-zstd and alias to "whatever we think is the best default", and make it use-zstd-incremental right now.

@ipetkov

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

installCargoArtifactsMode allows us to only specify one thing, meaning if we want to subtly tweak different installation strategies we'd have to keep tacking on use-zstd-flag-1, use-zstd-flag-2, use-zstd-flag-1-and-flag-2 which could get unweildy IMO

Copy link
Contributor Author

@dpc dpc Sep 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not going to support that many variations, IMO. But if it's neccessary it could be

installCargoArtifactsMode = { mode = "use-zstd"; diff = true; };

which crane can easily detect and handle, I think, with full freedom of parametrizing.

@ipetkov

lib/setupHooks/installCargoArtifactsHook.sh Show resolved Hide resolved
Comment on lines +29 to +30
rm -f "${cargoTargetDir}/.crane-previous-archive"
ln -s "${preparedArtifacts}" "${cargoTargetDir}/.crane-previous-archive"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works fine, though technically we're wasting a few operations waiting for the file system to delete/create the new marker file. Maybe we could store this in a (global) variable to be a bit cheaper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know (still don't) how to do that. :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can land as is and then follow-up with improvement?

BTW. I'm totally fine if you want to just redo/tweak and merge this PR as you please. For final touches it is generally faster than going back and forth. :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I can take it to the finish line sometime this week!

dpc added a commit to dpc/fedimint that referenced this pull request Sep 18, 2023
This should greatly decrease the sizes of workspace
artifacts that crane produces.

See ipetkov/crane#387
@dpc
Copy link
Contributor Author

dpc commented Sep 19, 2023

After merging it in our CI pipeline, I've noticed MacOS builds are failing:

 clippy-stable-with-std> /private/tmp/nix-build-clippy-stable-with-std-2023-08-24.drv-0/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 14: install_name_tool: command not found
building '/nix/store/9d2ppvs57sbx5av3k9zhbi4a1nzp4vdc-rust-stable-with-components-2023-08-24.drv'...
rust-stable-with-components> /nix/store/mld55r1zcxzyk3mc9m113z5wgy9zm8gk-clippy-stable-with-std-2023-08-24/bin:
rust-stable-with-components> rustdoc: /nix/store/f76rb077fkwgygj6b80hphnvv9x2yxkj-rust-stable-with-std-2023-08-24/bin/rustdoc
rust-stable-with-components> rust-gdbgui: /nix/store/f76rb077fkwgygj6b80hphnvv9x2yxkj-rust-stable-with-std-2023-08-24/bin/rust-gdbgui
rust-stable-with-components> rust-lldb: /nix/store/f76rb077fkwgygj6b80hphnvv9x2yxkj-rust-stable-with-std-2023-08-24/bin/rust-lldb
rust-stable-with-components> rustc: /nix/store/f76rb077fkwgygj6b80hphnvv9x2yxkj-rust-stable-with-std-2023-08-24/bin/rustc
rust-stable-with-components> rust-gdb: /nix/store/f76rb077fkwgygj6b80hphnvv9x2yxkj-rust-stable-with-std-2023-08-24/bin/rust-gdb
rust-stable-with-components> /nix/store/mld55r1zcxzyk3mc9m113z5wgy9zm8gk-clippy-stable-with-std-2023-08-24/lib:
rust-stable-with-components> librustc_driver-7e633afbb0050726.dylib: /nix/store/f76rb077fkwgygj6b80hphnvv9x2yxkj-rust-stable-with-std-2023-08-24/lib/librustc_driver-7e633afbb0050726.dylib
rust-stable-with-components> /private/tmp/nix-build-rust-stable-with-components-2023-08-24.drv-0/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 14: install_name_tool: command not found
rust-stable-with-components> /private/tmp/nix-build-rust-stable-with-components-2023-08-24.drv-0/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 14: install_name_tool: command not found
rust-stable-with-components> /private/tmp/nix-build-rust-stable-with-components-2023-08-24.drv-0/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 14: install_name_tool: command not found

and then missing files that should have been extracted from deps only build. Weird. Will try to rebase and debug.

@dpc
Copy link
Contributor Author

dpc commented Sep 19, 2023

Will try to rebase and debug.

Ah, damn it. sqlite-sys's build.rs uses std::fs::copy to generate the file by using checked-in copy.

Unfotunately on MacOS that uses fcopyfile:

The fcopyfile function preserves the modification, access, and creation timestamps from the source file to the destination file by default.

So the generated file copied the mtime from the existing one, and gets skipped. Bummer. Didn't take long to hit a corner case.

Another reason why use-zstd-full will be required sometimes, but also I have a simple solution - if the given deriation doesn't inherit any artifacts, it should just always do the full snapshot.

@dpc
Copy link
Contributor Author

dpc commented Sep 19, 2023

Re: rust-lang/rust#115982

@j-baker
Copy link
Contributor

j-baker commented Sep 19, 2023

Yep - that's (among other reasons) why I didn't like the look of anything which relies on FS metadata - too easy to mess it up accidentally, especially when cross platform. Agree that it'll mostly work, though.

It's worth noting that rust buildscripts can do anything, I'm sure you're aware of the cmake crate. Openssl literally compiles ssl in its build...

inotify might be another viable approach? subscribe to the dir pre-build, watch for events and stash them for later?

@dpc
Copy link
Contributor Author

dpc commented Sep 19, 2023

I think it's OK as is for now. Thanks to using a full mode in deps only step, the worst case (build.rs scripts of 3rd party tools) are virtually never going to get affected by this. And it's always possible to revert to use-zstd-full.

It would be possible make a use-zstd-diff-precise. During decompression it would store somewhere next to ./target a list of all paths with their checksums, then during compression use mtime as a first fast check, and if 0, double-verify the existence, then checksum. Since everything here is basically IO-bound, an extra CPU time would get largely amortized and everything here should parallize well. Hey did I mentioned that I wrote tools like rdedup, rblake2sum that suck in data as fast as IO let them? :) . But I don't want to get into it right now, I'm supposed to be working on features, and can be polishing the CI the whole time. :D

@ipetkov
Copy link
Owner

ipetkov commented Sep 23, 2023

I think for now we can force Darwin builds to do a non-incremental compression until rust-lang/rust#115982 is resolved. We can also follow up with other optimizations later! (I think there's value in landing this in the mean time)

@ipetkov
Copy link
Owner

ipetkov commented Sep 23, 2023

@dpc I pushed my changes to a separate branch (I had forked off before you had made other changes and I didn't want to necessarily clobber your branch): #398

I think I've addressed all the points we've brought up here (for the time being), feel free to take it for a spin

@ipetkov
Copy link
Owner

ipetkov commented Oct 18, 2023

I'm going to close this since #398 has landed (though feel free to open a new PR if there's anything else worth changing). Thanks again for the feedback and initial direction here!

@ipetkov ipetkov closed this Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A way to output just a diff of ./target
3 participants