-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf] Use heuristics to avoid allocations in Sanitizer::str_till_eol #2563
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I'm in favor of this PR. It would be helpful to add test cases for the this specific parser, especially for when the heuristic is and isn't taken.
if !contains_unsafe_chars { | ||
Ok((after, before)) | ||
} else { | ||
recognize(Self::till(value((), Sanitizer::parse_safe_char), Self::eol))(before) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to use a till
operator until Self::eol
since this is knows to be a single line string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point; I'm not sure, but since I'm not super savvy with nom
, I preferred to leave it intact
}, | ||
)(string) | ||
// A heuristic approach is applied here in order to avoid | ||
// costly parsing operations in the most common scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Can you explain the heuristic in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to my TODO (as with the Program
-related comment, I'd suggest to add it separately to avoid merge-related issues)
let contains_unsafe_chars = !before.chars().all(is_char_supported); | ||
|
||
if !contains_unsafe_chars { | ||
Ok((after, before)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the goal to avoid allocations in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@vicsn I updated the original branch with 2 commits addressing the notes on extra docs and the redundant use of @d0cd as for extra tests, note that |
@ljedrz on initial scan, I don't see test case for a multi-line comment, multiple single line comments, multiple multi-line comments, and interleavings of the two. I'll leave it to your discretion, but I would at least recommend a multi-line comment and multiple single line comments. I also saw that you use The reason I am being so insistent on comments is that these parsers are quite sensitive and not many people have a deep expertise in |
While not a big deal (extra test cases, docs, and one optimization), this PR was still missing the 4 extra commits mentioned at the end. I can include them in a follow-up shortly. |
@ljedrz apologies I overlooked that I had to update my branch |
Motivation
Replaces #2517
Original PR message:
(transferred from ProvableHQ#6, and asking @d0cd for a review as suggested there)
The Sanitizer is used very prominently in our parsing functions, and it is also a source of many allocations, most of which are temporary and avoidable.
The potential perf improvements are quite large, and I've measured them both with a 15-minute run of a --dev node and using hyperfine on a small binary that parsed all the valid .aleo programs currently present in the snarkVM codebase.
dev node:
parsing all .aleo programs using Program::from_str:
Test Plan
CI run link