-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding archive entry paths #3638
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty clean, nice job!
pkg/handlers/archive.go
Outdated
@@ -101,37 +102,37 @@ func (h *archiveHandler) HandleFile(ctx logContext.Context, input fileReader) ch | |||
var ErrMaxDepthReached = errors.New("max archive depth reached") | |||
|
|||
// openArchive recursively extracts content from an archive up to a maximum depth, handling nested archives if necessary. | |||
// It takes a reader from which it attempts to identify and process the archive format. Depending on the archive type, | |||
// it either decompresses or extracts the contents directly, sending data to the provided channel. | |||
// It takes a string representing the path to the archive and a reader from which it attempts to identify and process the archive format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment reads "a string" but the actual function takes a string slice. Which is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix that. That comment was from a previous version.
reader fileReader, | ||
dataOrErrChan chan DataOrErr, | ||
) error { | ||
ctx.Logger().V(4).Info("Starting archive processing", "depth", depth) | ||
defer ctx.Logger().V(4).Info("Finished archive processing", "depth", depth) | ||
ctx.Logger().V(4).Info("Starting archive processing", "depth", len(archiveEntryPaths)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo just attach the paths here now that you have them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean change depth
to paths
in the logs?
return h.openArchive(ctx, depth+1, rdr, dataOrErrChan) | ||
// Note: We're limited in our ability to add file names to the archiveEntryPath here, as the decompressor doesn't have access to a fileName value. | ||
// We add a empty string so we can keep track of the archive depth. | ||
return h.openArchive(ctx, append(archiveEntryPaths, ""), rdr, dataOrErrChan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we ever end up stringifying the slice of path parts, an empty string isn't going to maximally visually obvious as another level of depth, compared to like a ?
or something. (Compare: some/path///file.txt
to e.g. some/path/?/?/file.txt
) What do you think of using a non-empty marker like ?
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 36bdef0 fixes an oversight from an earlier version and is relevant to this discussion.
The main change is using file.NameInArchive
instead of file.Name()
. The difference is file.NameInArchive
contains the relative path inside the archive that is being extracted. This does a few things:
- Previously, I mistakenly treated all files during an extraction operation as if they were in a flat directory and didn't preserve the actual directory structure. I didn't realize the
file.Name()
operation just grabbed the filename, not the path. That's fixed and we now have complete relative file paths in all supported archives. - This change makes it abundantly clear how a user would navigate an archive to get to the actual file. For example: if you have an archive file containing another archive named
archive.tar.gz
with a file namedsecret.txt
, it would return this in the archive entry patharchive.tar.gz/archive/secret.txt
. When you manually double-click to unarchivearchive.tar.gz
it tosses all of the contents into a new folder namedarchive
and then you'd seesecret.txt
. It's super clean and clear. This method still uses the""
to track depth during decompression, but in thefilepath.Join()
operation, empty strings are ignored, which is exactly what we want to reconstruct the relative file path. If we change to add?
or anything else, we'd need to write a custom function to strip those out during thefilepath.Join()
. Since we're able to keep accurate track ofdepth
and relative file path, I'd suggest just leaving as is. - One other note: I only updated the error logs to include
file.NameInArchive
b/c I didn't want to bloat our non-error logs with the entire file path. This means the log at the very beginning of theextractorHandler
function only hasfile.Name()
. Not sure if that was the correct call or not.
Description:
Related to and inspired by #1551. This PR adds a new ExtraData field named
Archive Entry Path
that contains the relative path of the file containing a secret within an archive file.Note: When decompressing files that only contain one file (ex:
.gz
,.xz
,.lz4
, etc), it won't list the decompressed file's name due to a limitation in the archiver library we use. However, since it only results in one decompressed file, it should be self-explanatory to the user.For example:
If there is a file in secret.txt, we'll see:
Archive Entry Path: secret.zip.gz/secret.txt
File: /path/to/archive.tar
Checklist:
make test-community
)?make lint
this requires golangci-lint)?