-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use generics instead of dynamic dispatch internally in tar::Builder
#395
base: main
Are you sure you want to change the base?
Conversation
And that would happen here when your tar output is to a file, and you're providing a OK, though I do kind of question how often this occurs, because most use cases for tar end up being compressed (i.e. the output side is not a fd). Can you at briefly explain your higher level use case and what led you to try this optimization? |
Fair point on the fact that this is mostly used for compression purposes. However, this specialization also applies to writers like I stumbled on this when someone told me they were trying to archive a python environment and taring was taking a significant portion of the time. I then looked through the implementation and realized many syscalls were being issued that could be significantly reduced. Although this optimization may dramatically reduce the number of syscalls in some cases it seldom results in as big time reduction, but it is not negligible. This can be quite useful whey you'd like to write to an archive temporarily, even if it is going to be compressed afterwards, or you want to pipe that result into another program to handle the compression. I guess what surprised me is that since |
There are certainly use cases for uncompressed tar; over in OCI land we were discussing that uncompressed tar should used when the data embedded is already compressed - AI models can be like this. This helps avoid a problem in OCI because it checksums compressed data, and that can make it hard to ensure reproducible caching. Anyways though...the change sounds reasonable but, this crate is widely used and I personally try to be pretty conservative with what we merge. This type of change has the potential to cause regressions (probably in code size mostly - but certainly not exclusively). Can you please provide a bit more benchmarking in a targeted use case?
A bit more data? Are we talking 1%? 10%? As far as syscall traffic, of course others have been looking at io-uring based work (see #355 ), but that's a much deeper change. |
Hi, I understand you being conservative and I appreciate it! Unfortunately I am not able to provide benchmarks more specific than I have already provided in the original description, but the perf benefits are in the order showed on the previous benchmarks (up to 38%, but 10% mostly). I am certainly looking at io-uring based alternatives myself but I agree that's a much larger change and may not fit in this crate. This PR though I thought would benefit upstream for a sensible optimization without too many downsides. And I'd be interested to know what other downsides you think this would cause downstream and if there is some specific example you might be worried about. I'd also be happy to consider feature-flagging this if you think that'd be more appropriate. |
IIRC there are desires to extract and create a sans-IO core tar crate |
TL;DR Using generics instead of dynamic dispatch allows for specialization and a performance win on modern Linux systems.
(look here for full benchmark details)
This PR is specifically targeted to allow triggering the specialization of the
Write
implementations when usingtar::Builder
. In particular, the stdlib has a specialization on Linux forstd::io::copy
that allows it to use thecopy_file_range
syscall. Unfortunately the code as-is is not capable of performing such specialization (or potentially others) since it relies on dynamic dispatch internally. This PR changes the underlying implementation to uses generics instead of dynamic dispatch. This understandably comes at the potential cost of increased code size but I'd argue this choice is already exposed to users that should be willing to pay the code size cost when usingtar::Builder<T>
directly instead oftar::Builder<dyn Write>
. Users looking for reduced code footprint can explicitly opt in through usingtar::Builder<dyn Write>
instead.PS: I had to change one of the tests because it looks like this specialization behaves differently than the usual before. Previously since the
pax_simple_write
test was using aBufWriter
it would try toappend_file
the archive inside the archive itself. That would work because the writes to the file would be buffered and thus reading the archive at that point would read nothing. But withcopy_file_range
that is no longer true. Anyway, I am pretty sure this was a bug in the test code, but if not, the correct fix would be to do some sort of inode tracking like tar does.