-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add doc hint that default is different than tar
#366
Add doc hint that default is different than tar
#366
Conversation
The default of `tar` is to include a symlink in the archive. The default of `tar::Builder` is to resolve symlinks and replace them with the resulting file. This commit helps to clarify that difference by highlighting that `follow_symlinks(true)` provides the same behavior as `tar -h`. ## Context This caused me much head-scratching in a problem where calling `tar` resulted in much lower file sizes than what I thought was the comparable rust code. I created a reproduction as part of that debugging process (if you're interested) https://github.com/schneems/tar_comparison/blob/bfd420a012b46e80435cf4e7c67ca1661357fde3/README.md. It turns out this was the issue I was facing: ``` $ tar -tvzf system_tar_gzip_one_operation.tar.gz | grep libruby.so -rwxr-xr-x 0 rschneeman staff 12364984 Jun 5 16:13 lib/libruby.so.3.1.6 lrwxr-xr-x 0 rschneeman staff 0 Jun 5 16:14 lib/libruby.so.3.1 -> libruby.so.3.1.6 lrwxr-xr-x 0 rschneeman staff 0 Jun 5 16:14 lib/libruby.so -> libruby.so.3.1.6 $ tar -tvzf rust_tar_gzip_one_operation.tar.gz | grep libruby.so -rwxr-xr-x 0 501 20 12364984 Jun 5 16:13 lib/libruby.so.3.1.6 -rwxr-xr-x 0 501 20 12364984 Jun 5 16:13 lib/libruby.so.3.1 -rwxr-xr-x 0 501 20 12364984 Jun 5 16:13 lib/libruby.so ``` In the "system" archive produced by the `tar` cli there's one `so` binary and two symlinks to the same file. In the "rust" version (using `tar::Builder`) there are three copies of the same binary. I previously saw that option's documentation when I was debugging my issue but didn't fully internalize the implications. My hope is that by documenting what feature this is similar to in (system) `tar`, that the reader will make a connection that the two have different defaults.
@@ -40,6 +40,9 @@ impl<W: Write> Builder<W> { | |||
|
|||
/// Follow symlinks, archiving the contents of the file they point to rather | |||
/// than adding a symlink to the archive. Defaults to true. | |||
/// | |||
/// When true, it exhibits the same behavior as GNU `tar` command's | |||
/// `--dereference` or `-h` options <https://man7.org/linux/man-pages/man1/tar.1.html>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting on my own PR to list some alternatives. Feel free to dismiss this comment (if you prefer the current PR), or comment below if you want to explore an alternative.
Alternatives
Copy external symlinks, preserve internal symlinks
Instead of attempting to fix this issue via documentation, the project could add an additional mode that preserves symlinks within the project but copies externally symlinked files. If that were on by default, it would help preserve the existing behavior while minimizing archive size.
Consequences:
- If this were on by default, I never would have noticed a difference between the defaults.
- I'm unsure if there are edge cases (such as systems that don't support symlinks) where this could break existing workflows. To find out: We could add the option before defaulting it to be on.
Different doc changes
We could change the existing documentation and memorialize the existing behavior:
/// Copy the resolved contents of symlinks.
///
/// When enabled, any symlinks will be replaced by the contents of the file
/// that they point to. If the archive has relative symlinks to other files
/// within the same archive, those files will be duplicated. Defaults to true.
Consequences:
- This setting could result in a larger file size. The words "copy" and "duplicated" emphasize the result of this setting.
- The problem I ran into is spelled out (versus adding the
tar
options link, which is only hinted at). - The wording around symlinks is very hard to follow in general. I wonder if the user would get the message warning; this may cause your archive size to be quite large. It balances saying enough to get the point across without overwhelming the reader.
- This documentation could be seen as stabilizing behavior and might increase resistance to implementing "Copy external symlinks, preserve internal" in the future.
Add a doc test demonstrating current behavior
Show the same scenario with and without the option on. That would highlight what it's doing without getting bogged down in too much prose.
Consequences:
- Would make the docs longer.
- Code is less ambiguous than prose.
Thanks! |
The default of
tar
is to include a symlink in the archive. The default oftar::Builder
is to resolve symlinks and replace them with the resulting file.This commit clarifies that difference by highlighting that
follow_symlinks(true)
provides the same behavior astar -h
.Context
I ran into a problem that caused much head-scratching: calling
tar
resulted in much smaller file sizes than what I thought was the comparable Rust code. I created a reproduction as part of that debugging process (if you're interested): https://github.com/schneems/tar_comparison/blob/bfd420a012b46e80435cf4e7c67ca1661357fde3/README.md.It turns out this was the issue I was facing:
In the "system" archive produced by the
tar
cli there's oneso
binary and two symlinks to the same file. In the "rust" version (usingtar::Builder
) there are three copies of the same binary.I previously saw that option's documentation when I was debugging my issue but didn't fully internalize the implications. By documenting what feature this is similar to in (system)
tar
, I hope the reader will make a connection that the two have different defaults.