Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc hint that default is different than tar #366

Merged

Conversation

schneems
Copy link
Contributor

@schneems schneems commented Jun 6, 2024

The default of tar is to include a symlink in the archive. The default of tar::Builder is to resolve symlinks and replace them with the resulting file.

This commit clarifies that difference by highlighting that follow_symlinks(true) provides the same behavior as tar -h.

Context

I ran into a problem that caused much head-scratching: calling tar resulted in much smaller file sizes than what I thought was the comparable Rust code. I created a reproduction as part of that debugging process (if you're interested): https://github.com/schneems/tar_comparison/blob/bfd420a012b46e80435cf4e7c67ca1661357fde3/README.md.

It turns out this was the issue I was facing:

$ tar -tvzf system_tar_gzip_one_operation.tar.gz | grep libruby.so
-rwxr-xr-x  0 rschneeman staff 12364984 Jun  5 16:13 lib/libruby.so.3.1.6
lrwxr-xr-x  0 rschneeman staff        0 Jun  5 16:14 lib/libruby.so.3.1 -> libruby.so.3.1.6
lrwxr-xr-x  0 rschneeman staff        0 Jun  5 16:14 lib/libruby.so -> libruby.so.3.1.6

$ tar -tvzf rust_tar_gzip_one_operation.tar.gz | grep libruby.so
-rwxr-xr-x  0 501    20   12364984 Jun  5 16:13 lib/libruby.so.3.1.6
-rwxr-xr-x  0 501    20   12364984 Jun  5 16:13 lib/libruby.so.3.1
-rwxr-xr-x  0 501    20   12364984 Jun  5 16:13 lib/libruby.so

In the "system" archive produced by the tar cli there's one so binary and two symlinks to the same file. In the "rust" version (using tar::Builder) there are three copies of the same binary.

I previously saw that option's documentation when I was debugging my issue but didn't fully internalize the implications. By documenting what feature this is similar to in (system) tar, I hope the reader will make a connection that the two have different defaults.

The default of `tar` is to include a symlink in the archive. The default of `tar::Builder` is to resolve symlinks and replace them with the resulting file.

This commit helps to clarify that difference by highlighting that `follow_symlinks(true)` provides the same behavior as `tar -h`.

## Context

This caused me much head-scratching in a problem where calling `tar` resulted in much lower file sizes than what I thought was the comparable rust code. I created a reproduction as part of that debugging process (if you're interested) https://github.com/schneems/tar_comparison/blob/bfd420a012b46e80435cf4e7c67ca1661357fde3/README.md.

It turns out this was the issue I was facing:

```
$ tar -tvzf system_tar_gzip_one_operation.tar.gz | grep libruby.so
-rwxr-xr-x  0 rschneeman staff 12364984 Jun  5 16:13 lib/libruby.so.3.1.6
lrwxr-xr-x  0 rschneeman staff        0 Jun  5 16:14 lib/libruby.so.3.1 -> libruby.so.3.1.6
lrwxr-xr-x  0 rschneeman staff        0 Jun  5 16:14 lib/libruby.so -> libruby.so.3.1.6

$ tar -tvzf rust_tar_gzip_one_operation.tar.gz | grep libruby.so
-rwxr-xr-x  0 501    20   12364984 Jun  5 16:13 lib/libruby.so.3.1.6
-rwxr-xr-x  0 501    20   12364984 Jun  5 16:13 lib/libruby.so.3.1
-rwxr-xr-x  0 501    20   12364984 Jun  5 16:13 lib/libruby.so
```

In the "system" archive produced by the `tar` cli there's one `so` binary and two symlinks to the same file. In the "rust" version (using `tar::Builder`) there are three copies of the same binary.

I previously saw that option's documentation when I was debugging my issue but didn't fully internalize the implications. My hope is that by documenting what feature this is similar to in (system) `tar`, that the reader will make a connection that the two have different defaults.
@@ -40,6 +40,9 @@ impl<W: Write> Builder<W> {

/// Follow symlinks, archiving the contents of the file they point to rather
/// than adding a symlink to the archive. Defaults to true.
///
/// When true, it exhibits the same behavior as GNU `tar` command's
/// `--dereference` or `-h` options <https://man7.org/linux/man-pages/man1/tar.1.html>.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting on my own PR to list some alternatives. Feel free to dismiss this comment (if you prefer the current PR), or comment below if you want to explore an alternative.

Alternatives

Copy external symlinks, preserve internal symlinks

Instead of attempting to fix this issue via documentation, the project could add an additional mode that preserves symlinks within the project but copies externally symlinked files. If that were on by default, it would help preserve the existing behavior while minimizing archive size.

Consequences:

  • If this were on by default, I never would have noticed a difference between the defaults.
  • I'm unsure if there are edge cases (such as systems that don't support symlinks) where this could break existing workflows. To find out: We could add the option before defaulting it to be on.

Different doc changes

We could change the existing documentation and memorialize the existing behavior:

/// Copy the resolved contents of symlinks.
///
/// When enabled, any symlinks will be replaced by the contents of the file
/// that they point to. If the archive has relative symlinks to other files 
/// within the same archive, those files will be duplicated. Defaults to true.

Consequences:

  • This setting could result in a larger file size. The words "copy" and "duplicated" emphasize the result of this setting.
  • The problem I ran into is spelled out (versus adding the tar options link, which is only hinted at).
  • The wording around symlinks is very hard to follow in general. I wonder if the user would get the message warning; this may cause your archive size to be quite large. It balances saying enough to get the point across without overwhelming the reader.
  • This documentation could be seen as stabilizing behavior and might increase resistance to implementing "Copy external symlinks, preserve internal" in the future.

Add a doc test demonstrating current behavior

Show the same scenario with and without the option on. That would highlight what it's doing without getting bogged down in too much prose.

Consequences:

  • Would make the docs longer.
  • Code is less ambiguous than prose.

@schneems schneems marked this pull request as ready for review June 6, 2024 21:42
@alexcrichton alexcrichton merged commit cdf5b6f into alexcrichton:main Jul 8, 2024
7 checks passed
@schneems
Copy link
Contributor Author

schneems commented Jul 8, 2024

Thanks!

@schneems schneems deleted the schneems/doc-follow_symlinks branch July 8, 2024 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants