Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spaces in filename cause ReleaseEntry Regex to fail #1283

Closed
frederikolsen88 opened this issue Apr 11, 2018 · 23 comments · May be fixed by #1401
Closed

Spaces in filename cause ReleaseEntry Regex to fail #1283

frederikolsen88 opened this issue Apr 11, 2018 · 23 comments · May be fixed by #1401

Comments

@frederikolsen88
Copy link

frederikolsen88 commented Apr 11, 2018

I'm in a situation where Squirrel updating fails when iterating over one particular release entry

E6651B768DB82076E7113086558548B4DA4F8F6F Artifact - BoxHeader - e7ee795c-99e1-4ecb-a72c-9a41461d8198.xml.shasum 8669

using this Regex:

^([0-9a-fA-F]{40})\s+(\S+)\s+(\d+)[\r]*$

The culprit appears to be that (\S+) will only match non-whitespace characters. I've played a bit around with the Regex Tester at http://regexstorm.net/tester, and my suggestion would be to change the Regex to something similar to

^([0-9a-fA-F]{40})\s+(.*).shasum[\s+](\d+)[\r]*$

We don't really want to be forced to change the filenames or escape the spaces, so I'm interested in a solution where we can avoid that. Thanks in advance.

@frederikolsen88 frederikolsen88 changed the title Spaces in filename causes ReleaseEntry Regex to fail Spaces in filename cause ReleaseEntry Regex to fail Apr 11, 2018
@frederikolsen88
Copy link
Author

@paulcbetts, would you accept a PR incorporating the suggestion above, or would there be caveats to that approach?

@anaisbetts
Copy link
Contributor

I'm confused, why is this file in your RELEASES file?

@frederikolsen88
Copy link
Author

Sorry, I realize now I was a bit unclear.

E6651B768DB82076E7113086558548B4DA4F8F6F Artifact - BoxHeader - e7ee795c-99e1-4ecb-a72c-9a41461d8198.xml.shasum 8669

are the contents of the .shasum corresponding with the file in question in one of our delta packages. It is when trying to update using a delta package that our users get an "Invalid release entry" exception. If there are changes to apply to a file with spaces in the filename, that is.

So while the RELEASES file isn't in play, I guess the error message is worded like that is because you're using the same Regex to parse both .shasum files and the RELEASES file.

@JohnThomson
Copy link
Contributor

Having the same problem. Any time Squirrel decides to use a .diff (or .bsdiff) to represent the difference between old and new versions of a file, it generates a corresponding .shasum file along with the .diff. Here's an example of the shasum contents for one of our packages:

09E36653DEFCAFEDA55115AD81AC84E1B8D4A1FF Install Bloom Literacy Fonts.exe.shasum 12992

When applying the patch in DeltaPackage.applyDiffToFile, Squirrel calls VerifyPatchedFile(), which reads the .shasum file and passes the contents to ReleaseEntry.ParseReleaseEntry(), which fails to parse it because it can't match the spaces in the file name.

Squirrel then concludes that the patch is faulty, aborts the incremental update, and downloads the whole package, defeating the incremental update feature entirely.

I agree that Frederik's fix (or even just changing (\S+) to (.+) would fix the problem.

@JohnThomson
Copy link
Contributor

I suppose to be sure there is some filename present, and that if there are multiple spaces at the end none of them are put in the filename group, (\S.*shasum) would be better.

@JohnThomson
Copy link
Contributor

No, that won't work for the existing usages with .nupkg. We can however expect at least two non-space characters, since all usages involve either .shasum or .nupkg. (\S.*\S) should work, then. (At least, it passes all the existing ReleaseEntry unit tests.)

@cbenard
Copy link
Contributor

cbenard commented Oct 23, 2018

Wouldn't (\w+)\s(.+)\s(\d+) be much simpler to match the hash at the beginning and the length at the end?

@JohnThomson
Copy link
Contributor

(a) Paul's regex requires exactly 40 of the expected characters in the sha, a better validation.
(b) ^ and $ are needed to ensure it matches the whole file content for validation
(c) If there is more than one white-space character surrounding the file name, your version will include the extra white space in the filename, which is probably not good.
(d) Your version will match something with nothing but white space between the sha and the length, as long as there are at least three spaces
(e) A minimal change to fix the problem is usually wise if one is not fully familiar with a piece of code and all its purposes.

@cbenard
Copy link
Contributor

cbenard commented Oct 23, 2018

I assumed we were talking about matching per-line. You're right about the 3 spaces thing. I didn't even see the original regex which is a bit more complicated than it needs to be.

It does seem like a pretty easy fix, especially if you know each line begins with 40 upper-case hex characters for the hash and ends with a set of numbers for length and has a filename in the middle delineated with spaces before/after. I'm surprised it's still open. I was subscribed to it for some reason, but I can't even remember if I had the issue.

@cbenard
Copy link
Contributor

cbenard commented Oct 23, 2018

How about adding \S inside the filename group to ensure there's at least one non-space character and then allow multiple spaces before/after like this? Also, I made the filename match non-greedy so it won't match extra spaces between the filename and the hash/length.

^([0-9a-fA-F]{40})\s+(\S.+?)\s+(\d+)\r*$

I did a bunch of tests in The Regex Coach and it seems to work well, and it's very similar to the original.

@JohnThomson
Copy link
Contributor

JohnThomson commented Oct 23, 2018

It will probably work for much the same reason as mine: filenames that will really occur here will have at least two characters. However, in the case of a single-character file name followed by two spaces, it will include one of the spaces in the filename. Using * after the dot would be even better.
Both your version and one with * instead also pass all the existing unit tests. But I think the * variation, which will allow a single character file name, is the best suggestion yet.

@cbenard
Copy link
Contributor

cbenard commented Oct 23, 2018

\S.? means one non-space and zero or one single character. You probably meant .*? or something, like a non-greedy zero-to-any length match.

This one allows single character filenames, at the expense of requiring a single space before/after the filename: ^([0-9a-fA-F]{40})\s(\S[\S ]*?)\s(\d+)\r*$. It also stays closer to the original's definition of what a filename is (\S) plus spaces.

I don't have a dev env setup for this project, so I'm just trying to help with regex. :) I'll probably let this be my last comment on the matter.

@JohnThomson
Copy link
Contributor

No, I tried to write openparen-backslash-S-dot-star-questionmark-closeparen and markdown interpreted it as a request for italics. So I edited the comment, but unfortunately you saw the original. Sorry.

@cbenard
Copy link
Contributor

cbenard commented Oct 23, 2018

If you surround your "code" with backticks, you don't have to deal with the escapes. It'll also do a mono-space font with a gray background instead of italics like you're using.

Like this `some \code here` makes it look like: some \code here

@JohnThomson
Copy link
Contributor

OK, so what I intended to suggest was (\S.*?), or in full ^([0-9a-fA-F]{40})\s+(\S.*?)\s+(\d+)\r*$
Thanks.

@JohnThomson
Copy link
Contributor

Your idea of allowing only spaces, not other whitespace, is also a good one. Could still use \s+ before and after, counting on the ? to prevent a trailing space being included in the filename. So ^([0-9a-fA-F]{40})\s+(\S.[\S ]*?)\s+(\d+)\r*$ would result...which also passes unit tests.

@cbenard
Copy link
Contributor

cbenard commented Oct 23, 2018

Yours has an error that makes it not work for single character filenames. You have an errant period inside the filename match after (\S. That requires at least 2 characters for a filename and the 2nd character can be anything, including whitespace like tabs/vertical tab/etc.

@JohnThomson
Copy link
Contributor

Bother. A careless mistake. The version I actually unit-tested is ^([0-9a-fA-F]{40})\s+(\S[\S ]+?)\s+(\d+)[\r]*$

@cbenard
Copy link
Contributor

cbenard commented Oct 23, 2018

That one doesn't match single character filenames either. Your [\S ]+ in the middle should be *, not +. If you use +, it requires at least one character, in addition to the \S which is another character to begin.

@JohnThomson
Copy link
Contributor

Must have been really sleepy yesterday. ^([0-9a-fA-F]{40})\s+(\S[\S ]*?)\s+(\d+)[\r]*$.

@cbenard
Copy link
Contributor

cbenard commented Oct 24, 2018

That one looks good. Consider it co-signed.

^([0-9a-fA-F]{40})\s+(\S[\S ]*?)\s+(\d+)[\r]*$

Now if we can only get @paulcbetts to look at this issue still open from April... :)

@cbenard
Copy link
Contributor

cbenard commented Oct 24, 2018

Got the PR ready for you @paulcbetts after @JohnThomson and I worked through the regex. I hope you're able to accept the PR.

@Thieum
Copy link
Contributor

Thieum commented May 3, 2019

@shiftkey duplicate of #713, this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants