-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support adding sparse files to archives #375
Conversation
63f17d1
to
ef748ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I'll apologize though in that I don't have a huge amount of time and energy for this crate so while this looks quite good it's also large-ish so will probably take some time to land. If you're up for making some changes though I think this can be made easier to land perhaps.
- For turning this on-by-default the main thing that I'd be worried about is that this can affect Cargo. Cargo wants good compatibility with older versions of Cargo as well so I suspect that Cargo will want to disable this by default when updating. Would you be up for helping to coordinate that? I'll note though that I agree turning this on-by-default is reasonable.
- For depending on
nix
, I'd personally prefer not to. How difficult would it be to uselibc
and syscalls there? - Could you write some more comments in various places? I don't have the tar specification in my head unfortunately so if you're able to link to various places as to what sparse is and how it works that'd be helpful. For example there's one location where it says
let mut it = entries.entries.iter().skip(4).peekable();
but I'm not sure why 4 entries are skipped. - Do you think there's some extra tests that could be written? The tests of
find_sparse_entries
look pretty good (thanks!) but the integration tests intests/all.rs
are relatively light. If you feel everything is covered though that's also ok.
And finally, this is only if you're interested and is optional, but passing around sparse: bool
like the other parameters looks like it's not the greatest thing in the world. It might be best to wrap up a options like that in some sort of struct stored in the builder which can be passed aroudn to avoid having lots of parameters. If you're interested in such a refactoring I think it'd be worth doing that, but again it's optional and I don't think it's required for this.
tests/all.rs
Outdated
assert!(data.len() <= 0x4_000); // Typically 0x1800 but potentially might | ||
// vary based on a filesystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this entire test run unconditionally on all systems, and this assert be limited to only the known OSes that implement sparse file support?
Sure. What we need is to add a few Note Starting from this version, the Also, for a timeline, the corresponding
Not difficult, just a bit more of
Done. As for 4, I've added a const
Updated the integration test to be more comprehensive.
I've prepended a commit to introduce Besides that, I've updated the list of supported platforms, see today's edits of the PR description. |
8d52993
to
ad39960
Compare
I apologize for the delay in getting back to this, but I also want to express many thanks again for your work on this! I really appreciate your willingness to send a PR here and put up with me being slow, and I think this has shaped up well! Given the age of sparse file support I think it might be reasonable to start off opening an issue on the Cargo repo alerting maintainers to the status of the situation and asking them if they think it's worth adding |
Also, if you're willing, I'd love to have help maintaining this repo if you're interested! |
I've created an issue at the Cargo repo (see the link above).
Yes, I'd be happy to help maintain this repo. |
Thanks! That also reminds me that I'd forgotten to publish these changes, which I've done so now. I've added you and @cgwalters to this repo now. Feel free to review/merge PRs and/or help with issue triage as you see fit! I'll continue to help out with publication and actually running |
I created #379 to track followup to this (having important discussion on a closed PR feels odd). |
This PR extends
Builder
to support archiving sparse files in the GNU format. This is the format that GNUtar
produces by default giving the--sparse
option, assumingPOSIXLY_CORRECT
is not set.Implementing sparse archives gives two useful properties:
Design choices in this PR
It uses the gnu/oldgnu format rather than the newer
GNU.sparse
extension to the PAX format.Reasons:
GnuHeader
struct already.The PR to support the PAX format Add support for PAX Format, Version 1.0 #298 remains open.
It is enabled by default like BSD tar does, and unlike GNU tar does.
It's configured using
Builder::sparse(&mut self, sparse: bool)
method.If you think it should be configured via
HeaderMode
instead, please suggest the desired enum layout.It uses
SEEK_HOLE
to detect holes in the file. For comparison, other methods and implementations are listed below.BSD tar
SEEK_HOLE
Linux 3.1+
FIOMAP
FSCTL_QUERY_ALLOCATED_RANGES
The "Raw" method of GNU tar ignores actual holes and works by reading the file in 512-byte blocks and comparing them with zeros.
I did not include it as I believe it puts this feature into the "compression algorithm" realm, rather than the "preserve filesystem properties" realm.
I think implementing
FSCTL_QUERY_ALLOCATED_RANGES
method for Windows would be a useful addition, but currently, I don't have much interest in it.Linux, Solaris-like, and FreeBSD-like systems are supported.After AddSeekData
andSeekHole
toWhence
for hurd/apple targets nix-rust/nix#2473 got into the release (v0.30.0 I think), it could be possible to extend it to Hurd and Apple.Linux
freebsd
hurd
solaris
illumus
dragonfly
netbsd
openbsd
macos
A new dependencynix
is pulled for convenience.As a consequence, the MSRV is raised to 1.69.Although it's possible to rewrite it with unsafelibc
calls if that's preferred.During archive creation, it allocates 16 bytes of memory per hole. (see
struct SparseEntries
)In the worst pathological case, it will allocate 2 MiB per GiB of a file in a "zebra" pattern. (4096 bytes of data followed by a 4096-byte hole)
If it's a concern, I think, it's possible to implement it in an allocation-free way for
Builder<SW> where W: Seek + Write
.Two tests involving sparse files are added.
Tests are run unconditionally, but strict checks are only done in the following cases:
tests/all.rs
writing_sparse
: strict only on Linux. Not strict on FreeBSD due to the UFS caveat mentioned above.src/builder.rs
test_find_sparse_entries
: strict on Linux and FreeBSD.Potentially, these strict checks might fail when run on a filesystem that doesn't support holes.