use generics instead of dynamic dispatch internally in `tar::Builder` #395

GabrielDertoni · 2025-01-12T19:37:21Z

TL;DR Using generics instead of dynamic dispatch allows for specialization and a performance win on modern Linux systems.

bench	old time	new time	delta
cache=yes size=583M files=23k	1.9 s ± 0.1 s	1.7 s ± 0.1 s	-10.08%
cache=no size=583M files=23k	7.3 s ± 0.2 s	6.7 s ± 0.3 s	-8.95%
cache=yes size=284M files=1k	500 ms ± 4 ms	362 ms ± 28 ms	-38.12%
cache=no size=284M files=1k	918 ms ± 25 ms	889 ms ± 29 ms	-3.26%

(look here for full benchmark details)

This PR is specifically targeted to allow triggering the specialization of the Write implementations when using tar::Builder. In particular, the stdlib has a specialization on Linux for std::io::copy that allows it to use the copy_file_range syscall. Unfortunately the code as-is is not capable of performing such specialization (or potentially others) since it relies on dynamic dispatch internally. This PR changes the underlying implementation to uses generics instead of dynamic dispatch. This understandably comes at the potential cost of increased code size but I'd argue this choice is already exposed to users that should be willing to pay the code size cost when using tar::Builder<T> directly instead of tar::Builder<dyn Write>. Users looking for reduced code footprint can explicitly opt in through using tar::Builder<dyn Write> instead.

PS: I had to change one of the tests because it looks like this specialization behaves differently than the usual before. Previously since the pax_simple_write test was using a BufWriter it would try to append_file the archive inside the archive itself. That would work because the writes to the file would be buffered and thus reading the archive at that point would read nothing. But with copy_file_range that is no longer true. Anyway, I am pretty sure this was a bug in the test code, but if not, the correct fix would be to do some sort of inode tracking like tar does.

cgwalters · 2025-01-15T19:12:58Z

In particular, the stdlib has a specialization on Linux for std::io::copy that allows it to use the copy_file_range syscall.

And that would happen here when your tar output is to a file, and you're providing a File instance as input e.g. append()? OK. (Or a pipe/socket for either of those)

OK, though I do kind of question how often this occurs, because most use cases for tar end up being compressed (i.e. the output side is not a fd).

Can you at briefly explain your higher level use case and what led you to try this optimization?

GabrielDertoni · 2025-01-17T00:50:18Z

Fair point on the fact that this is mostly used for compression purposes. However, this specialization also applies to writers like BufWriter<T> which can have their larger internal buffers used directly, avoiding one unnecessary copy.

I stumbled on this when someone told me they were trying to archive a python environment and taring was taking a significant portion of the time. I then looked through the implementation and realized many syscalls were being issued that could be significantly reduced. Although this optimization may dramatically reduce the number of syscalls in some cases it seldom results in as big time reduction, but it is not negligible.

This can be quite useful whey you'd like to write to an archive temporarily, even if it is going to be compressed afterwards, or you want to pipe that result into another program to handle the compression. I guess what surprised me is that since tar::Builder is a generic type my performance expectation is that it will monorphize and benefit where it can from that. So I found it quite odd that internally it uses dynamic dispatch.

cgwalters · 2025-01-17T01:48:27Z

There are certainly use cases for uncompressed tar; over in OCI land we were discussing that uncompressed tar should used when the data embedded is already compressed - AI models can be like this. This helps avoid a problem in OCI because it checksums compressed data, and that can make it hard to ensure reproducible caching.

Anyways though...the change sounds reasonable but, this crate is widely used and I personally try to be pretty conservative with what we merge. This type of change has the potential to cause regressions (probably in code size mostly - but certainly not exclusively).

Can you please provide a bit more benchmarking in a targeted use case?

I stumbled on this when someone told me they were trying to archive a python environment and taring was taking a significant portion of the time. I then looked through the implementation and realized many syscalls were being issued that could be significantly reduced. Although this optimization may dramatically reduce the number of syscalls in some cases it seldom results in as big time reduction, but it is not negligible.

A bit more data? Are we talking 1%? 10%? As far as syscall traffic, of course others have been looking at io-uring based work (see #355 ), but that's a much deeper change.

GabrielDertoni · 2025-01-19T14:20:00Z

Hi, I understand you being conservative and I appreciate it! Unfortunately I am not able to provide benchmarks more specific than I have already provided in the original description, but the perf benefits are in the order showed on the previous benchmarks (up to 38%, but 10% mostly). I am certainly looking at io-uring based alternatives myself but I agree that's a much larger change and may not fit in this crate. This PR though I thought would benefit upstream for a sensible optimization without too many downsides. And I'd be interested to know what other downsides you think this would cause downstream and if there is some specific example you might be worried about. I'd also be happy to consider feature-flagging this if you think that'd be more appropriate.

NobodyXu · 2025-01-19T14:39:42Z

I am certainly looking at io-uring based alternatives myself but I agree that's a much larger change and may not fit in this crate.

IIRC there are desires to extract and create a sans-IO core tar crate

GabrielDertoni added 2 commits January 5, 2025 12:26

use generics instead of dynamic dispatch internally in tar::Builder

d312269

fix test to not read archive while writing to it

b47bf84

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use generics instead of dynamic dispatch internally in `tar::Builder` #395

use generics instead of dynamic dispatch internally in `tar::Builder` #395

GabrielDertoni commented Jan 12, 2025 •

edited

Loading

cgwalters commented Jan 15, 2025

GabrielDertoni commented Jan 17, 2025 •

edited

Loading

cgwalters commented Jan 17, 2025

GabrielDertoni commented Jan 19, 2025

NobodyXu commented Jan 19, 2025

use generics instead of dynamic dispatch internally in tar::Builder #395

Are you sure you want to change the base?

use generics instead of dynamic dispatch internally in tar::Builder #395

Conversation

GabrielDertoni commented Jan 12, 2025 • edited Loading

cgwalters commented Jan 15, 2025

GabrielDertoni commented Jan 17, 2025 • edited Loading

cgwalters commented Jan 17, 2025

GabrielDertoni commented Jan 19, 2025

NobodyXu commented Jan 19, 2025

use generics instead of dynamic dispatch internally in `tar::Builder` #395

use generics instead of dynamic dispatch internally in `tar::Builder` #395

GabrielDertoni commented Jan 12, 2025 •

edited

Loading

GabrielDertoni commented Jan 17, 2025 •

edited

Loading