-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for string values #58
Conversation
Codecov Report
@@ Coverage Diff @@
## master #58 +/- ##
==========================================
+ Coverage 75.96% 76.17% +0.21%
==========================================
Files 22 22
Lines 4609 4625 +16
==========================================
+ Hits 3501 3523 +22
+ Misses 682 679 -3
+ Partials 426 423 -3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catches!
(It might keep the commit history a bit cleaner to rebase your branch if you need to pick up changes from master, rather than repeatedly merging? No real problem from my end if you prefer the merge approach.)
ion/tokenizer.go
Outdated
@@ -542,6 +542,9 @@ func (t *tokenizer) readString() (string, error) { | |||
if err != nil { | |||
return "", err | |||
} | |||
if t.isProhibitedControlChar(c) { | |||
return "", &SyntaxError{"Invalid character", t.pos} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.invalidChar(c)
?
ion/tokenizer.go
Outdated
@@ -668,6 +678,27 @@ func (t *tokenizer) readEscapedChar(clob bool) (rune, error) { | |||
return 0, &SyntaxError{fmt.Sprintf("bad escape sequence '\\%c'", c), t.pos - 2} | |||
} | |||
|
|||
func (t *tokenizer) isProhibitedControlChar(c int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t is unused here, so maybe worth dropping it and adding these functions to textutils.go?
…-triple-quote # Conflicts: # ion/tokenizer.go
My preference is to rebase while the PR is a draft and merge once the first review has happened. This means that reviewers will see a version of the code that was up-to-date with In short, if you have merges in your commit history when you go to request your first review, rebase to clean it up. When the PR is approved, we'll merge from master and re-review if large changes were required. Finally, we'll squash+merge back into |
@@ -1263,3 +1273,24 @@ func (t *tokenizer) unread(c int) { | |||
t.pos-- | |||
t.buffer = append(t.buffer, c) | |||
} | |||
|
|||
func isInvalidChar(c int) bool { | |||
if c < 0x00 || c > 0x1F { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took a bit of investigation to understand why a negative int
was considered a valid character. Taking a look at read()
, I found that it's returning -1
instead of an EOF
err like Buffer.ReadByte
does (docs). I'd be in favor of refactoring this behavior to align with the standard library in the future, but in the meantime I think that isInvalidCharacter
should treat EOF
(-1
) as an invalid character, simplifying this check to:
// Values lower than this are non-displayable ASCII characters
if c > 0x1F {
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a couple things easier treating EOF as a character instead of an error, and it's an internal-only API so it felt like a good trade-off. It's very possible that's just my old C habits speaking though. :) Feel free to refactor it if you think it improves readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for having this treat -1 as an invalid char. In both places we're calling it we're checking for -1 immediately after, which you could then remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified and created #60 to refactor read()
ion/tokenizer.go
Outdated
@@ -582,20 +585,27 @@ func (t *tokenizer) readLongString() (string, error) { | |||
if err != nil { | |||
return "", err | |||
} | |||
if isInvalidChar(c) { | |||
return "", &SyntaxError{"Invalid character", t.pos} | |||
} | |||
|
|||
switch c { | |||
case -1: | |||
return "", t.invalidChar(c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're reporting an invalid character here when we encounter an EOF
, which may be confusing. Looks like we do the same thing in readString
above. In both cases, I suggest handling the EOF
case immediately after read()
out so we can provide a more informative error to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.invalidChar, as previously suggested, does the right thing here (and should probably have a better name :)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
ok, err := t.skipEndOfLongString(t.skipCommentsHandler) | ||
if err != nil { | ||
return "", err | ||
} | ||
if ok { | ||
return ret.String(), nil | ||
} | ||
|
||
if startPosition == t.pos { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like skipEndOfLongString
reports whether it consumed the string ending, so I think you can replace this position check with an else {...}
attached to the if ok {...}
.
wasConsumed, err := t.skipEndOfLongString(t.skipCommentsHandler)
if err != nil {
return "", err
}
if wasConsumed {
return ret.String(), nil
} else {
// The ' was not part of a long string ending.
ret.writeByte(byte(c))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll also return false if it skipped over a ''' /* pause in the string */ '''
sequence, in which case you don't want to keep the '. :( This seems correct to me in the short term, skipEndOfLongString
should probably should be refactored to return a tri-state in the longer term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I almost refactored skipEndOfLongString
but then I decided to KISS.
#61 to change skipEndOfLongString
ion/tokenizer.go
Outdated
@@ -1263,3 +1273,24 @@ func (t *tokenizer) unread(c int) { | |||
t.pos-- | |||
t.buffer = append(t.buffer, c) | |||
} | |||
|
|||
func isInvalidChar(c int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name's a bit ambiguous at the top level. Maybe isInvalidStringChar
? It'd fit nicely alongside the other isXxx
functions in textutils.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that's embarrassing! I read this comment and thought you're suggesting to change the name of the function, totally misread it.
I originally had named it isProhibitedControlChar
. Changing it back
ion/tokenizer.go
Outdated
return true | ||
} | ||
|
||
func isWhiteSpaceChar(c int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isWhitespace
is similarly-named but works a bit differently. Maybe worth combining them? Otherwise maybe rename this one to isStringWhitespace
or something like that.
ion/tokenizer.go
Outdated
|
||
switch c { | ||
case -1, '\n': | ||
if isProhibitedControlChar(c) || c == '\n' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should explicitly handle the EOF
/-1
case here and below. isProhibitedControlChar
will catch it, but the method name makes me think it's only looking for ASCII control characters, which doesn't include -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, but that means going back to this comment.
We can either change the function name to something more generic like invalidStringCharacter
or put the logic before this commit back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a subtle difference here. We've gone from:
if isInvalidChar(c) {
// Doesn't handle EOF, but could have based on the name
}
switch c {
case -1: // Handles EOF
to
if isProhibitedControlChar(c) {
// Handles EOF, but shouldn't based on the name.
}
I think isProhibitedControlChar
is a more precise/communicative name, so I'd like to keep it. But that means we need the explicit EOF check:
if c == -1 || c == '\n' || isProhibitedControlChar(c) {
// Handles EOF
}
I think this is especially helpful since someone new to the codebase could reasonably expect EOF to be handled by the
if err != nil {
// ...
}
above, since err
is how the standard library reports EOF
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Updated the pull request. Thanks.
This pull request contains fixes for 2 cases:
''' ion's fun! '''