Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: case insensitive path comparison option for Windows #228

Open
1 task done
dhananjay-gune opened this issue Aug 9, 2024 · 6 comments
Open
1 task done
Assignees

Comments

@dhananjay-gune
Copy link

Feature description

BitInputArchive contains() and find() should have an ignoreCase option.
currently the contains() and find() of the BitInputArchive class do case-sensitive comparison.
e.g. if to_search is ABC\xyz.txt and archive contains Abc\Xyz.txt, it won't find it.

BitFileCompressor.compressFiles() too, fails if the in_dir differs in case with the actual path on the disk. 
e.g. in_dir argument is C:\Temp\MYFOLDER whereas the actual path on the disk is C:\TEMP\MyFolder, the archive creation fails with 'path not found' :-o

On Windows this poses a problem where the clients can specify the file name (path param) in any case (lower, upper, camel, whatever) and the underlying api must work.

There might be other places where the path comparison is done case sensitively.

Requesting you to please find a way so that users don't have to worry about the case-sensitivity during any operation/comparison related to paths / file names e.g. a platform specific #define or something.

Additional context

No response

Code of Conduct

@dhananjay-gune
Copy link
Author

Currently I am implementing a workaround like:

BitArchiveReader reader = BitArchiveReader{ lib, archivePath, archiveFormat };
for each (auto entry in reader)
{
    auto entryInArchive = entry.path();
    bool isEqual = lstrcmpi(entryInArchive.c_str(), entryToSearch.c_str()) == 0;
    if (isEqual)
    {
        return true;
    }
}
return false;

I have used lstrmpi for comparing strings.

@rikyoz
Copy link
Owner

rikyoz commented Aug 9, 2024

Hi!

BitInputArchive contains() and find() should have an ignoreCase option.
currently the contains() and find() of the BitInputArchive class do case-sensitive comparison.
e.g. if to_search is ABC\xyz.txt and archive contains Abc\Xyz.txt, it won't find it.

I think it is a useful feature, I'll definitely add it, possibly in the next v4.1.

My only doubt is whether to make it build-time option (e.g., like BITZ7_AUTO_FORMAT), or a runtime argument for the contains() and find() functions. I'll need to evaluate which is the best.

BitFileCompressor.compressFiles() too, fails if the in_dir differs in case with the actual path on the disk.
e.g. in_dir argument is C:\Temp\MYFOLDER whereas the actual path on the disk is C:\TEMP\MyFolder, the archive creation fails with 'path not found' :-o

On Windows this poses a problem where the clients can specify the file name (path param) in any case (lower, upper, camel, whatever) and the underlying api must work.

There might be other places where the path comparison is done case sensitively.

This is probably a bit trickier than contains() and find(), since this behavior is due to the implementation of std::filesystem::path (https://stackoverflow.com/questions/61351236/lexical-compare-stdfilesystempath-case-insensitive), which bit7z uses internally for paths.

But I'll try to find a workaround.

Anyway, thank you for the feature request!

@dhananjay-gune
Copy link
Author

I also discovered that BitFileExtractor.extractMatching() gives an error if the case doesn't match.
Is there a way to extract a given directory entry from an archive in a case insensitive way?

@rikyoz
Copy link
Owner

rikyoz commented Aug 9, 2024

I also discovered that BitFileExtractor.extractMatching() gives an error if the case doesn't match.

Yeah, this is the expected behavior, as the wildcard matching is performed treating paths as strings rather than filesystem paths (and by default, string/char comparisons are case-sensitive). But it should be possible to allow case-insensitive matching, of course.

Actually, there are also some private functions called extractMatchingFilter that are used to implement all the "matching" extraction functions.
They take any generic "filtering" function (std::function< bool( const tstring& ) >), where the return value is true if the item must be extracted, false otherwise.
I'm starting to think that I should make these functions public, as they would help in cases like yours, or in general when the matching is not performed via a case-sensitive wildcard or regex pattern.

This latter change might be available already in the next v4.0.8, the other changes might require more time.

Is there a way to extract a given directory entry from an archive in a case insensitive way?

For the time being, you should be able to extract it by index, e.g.:

BitArchiveReader reader = BitArchiveReader{ lib, archivePath, archiveFormat };
for (const auto& entry : reader)
{
    auto entryInArchive = entry.path();
    bool isEqual = lstrcmpi(entryInArchive.c_str(), entryToSearch.c_str()) == 0;
    if (isEqual)
    {
        // extractTo takes an array of indices of the items to be extracted
        reader.extractTo( "<outPath>", { entry.index() } );
        break;
    }
}

@dhananjay-gune
Copy link
Author

Thanks! I'll try that.

@dhananjay-gune
Copy link
Author

Thanks! I'll try that.

It worked! Thanks.
I have used PathMatchSpec() win32 api to do the wildcarded matching:

// subTreeRoot is a directory inside the archive
tstring wildcardedSubTreeRoot = subTreeRoot + BIT7Z_STRING("\\*");
for (const auto& entry : extractor)
{
    auto entryInArchive = entry.path();
    BOOL matches = PathMatchSpec(entryInArchive.c_str(), subTreeRoot.c_str());
    if (!matches)
    {
        matches = PathMatchSpec(entryInArchive.c_str(), wildcardedSubTreeRoot.c_str());
    }
    if (matches)
    {
        indices.push_back(entry.index());
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants