Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for archiving long paths that have path components starting with ".." crossing the 100-character mark #390

Merged
merged 5 commits into from
Dec 5, 2024

Conversation

mitrandir77
Copy link
Contributor

The gnu tar supports arbirary path length by putting path truncated to standard 100 chars into the header and the rest is appended to contents. tar-rs validates that no path components should be exactly ".." but in this case when a component starting with ".." (for example file named "..some_file") gets truncated after 2 characters we hit this validation and can't tar such file.

I have verified that gnu tar command can handle such paths and actually puts truncated .. in the header. This pull request makes tar-rs behave the same.

See tests for repro of the issue.

(I think the code for handling this special case is quite ugly - I'd appreciate suggestions for making it better)

…ith .. at 100-character mark

his test checks very particular scenario where path component starting with
".." of a long path gets split at 100-byte mark so that ".." goes into header
and gets interpreted as parent dir (and rejected).
This is a fix for very specific scenario where a path component starting with
".." of a long path gets split at 100-byte mark so that ".." goes into header
and gets interpreted as parent dir (and rejected).

See tests for repro of the issue.
Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a superficial code style skim so far...I haven't deeply analyzed the code but it looks plausible

src/header.rs Outdated Show resolved Hide resolved
tests/all.rs Outdated Show resolved Hide resolved
src/header.rs Show resolved Hide resolved
tests/all.rs Outdated
let mut ar = Builder::new(Vec::new());

let mut header = Header::new_gnu();
header.set_cksum();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling set_cksum I think should always be the end thing, but here you can probably just drop it.

@mitrandir77
Copy link
Contributor Author

@cgwalters Thanks for all the comments. I have fixed the coding style issues you raised making the code hopefully easier to review for the next round.

@@ -768,7 +768,7 @@ fn prepare_header_path(dst: &mut dyn Write, header: &mut Header, path: &Path) ->
Ok(s) => s,
Err(e) => str::from_utf8(&data[..e.valid_up_to()]).unwrap(),
};
header.set_path(truncated)?;
header.set_truncated_path_for_gnu_header(&truncated)?;
Copy link
Collaborator

@cgwalters cgwalters Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing got me uncertain; this is a GNU specific behavior but we're operating on a generic tar &mut Header here.

In looking, the code today inside the generic _set_path does

        if let Some(ustar) = self.as_ustar_mut() {
            return ustar.set_path(path);
        }

And bypasses all of this. So I think what we have here is OK, but what I feel would improve things here is if we did a cast as_gnu_mut() and operated on that instead or so.

But anyways again, not a new issue in this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but what I feel would improve things here is if we did a cast as_gnu_mut() and operated on that instead or so

I agree that it's a good idea but unfortunately a major refactor. From what I can see the problem starts at: prepare_header_path function which acts as de-facto set_path that can handle GNU header. I think it's done this way because unlike other set_path functions (the vanilla and ustar one), this one doesn't just influence header structure but emits a new entry into archive containing that long path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants