Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Use heuristics to avoid allocations in Sanitizer::str_till_eol #2563

Merged
merged 3 commits into from
Nov 12, 2024

Conversation

vicsn
Copy link
Contributor

@vicsn vicsn commented Oct 22, 2024

Motivation

Replaces #2517

Original PR message:

(transferred from ProvableHQ#6, and asking @d0cd for a review as suggested there)

The Sanitizer is used very prominently in our parsing functions, and it is also a source of many allocations, most of which are temporary and avoidable.

The potential perf improvements are quite large, and I've measured them both with a 15-minute run of a --dev node and using hyperfine on a small binary that parsed all the valid .aleo programs currently present in the snarkVM codebase.

dev node:

all allocs are down ~15%, of which almost all are temporary
in Program::from_str specifically, allocs are reduced by ~64%, of which temp allocs ~88%

parsing all .aleo programs using Program::from_str:

allocs are reduced by ~41%
growth reallocs are down ~70%
runtime is reduced by ~31%

Test Plan

CI run link

Copy link
Contributor

@d0cd d0cd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm in favor of this PR. It would be helpful to add test cases for the this specific parser, especially for when the heuristic is and isn't taken.

if !contains_unsafe_chars {
Ok((after, before))
} else {
recognize(Self::till(value((), Sanitizer::parse_safe_char), Self::eol))(before)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to use a till operator until Self::eol since this is knows to be a single line string?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point; I'm not sure, but since I'm not super savvy with nom, I preferred to leave it intact

},
)(string)
// A heuristic approach is applied here in order to avoid
// costly parsing operations in the most common scenarios.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Can you explain the heuristic in the comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to my TODO (as with the Program-related comment, I'd suggest to add it separately to avoid merge-related issues)

let contains_unsafe_chars = !before.chars().all(is_char_supported);

if !contains_unsafe_chars {
Ok((after, before))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal to avoid allocations in this method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the allocations that are avoided are primarily:

@zosorock zosorock added the enhancement New feature or request label Oct 26, 2024
Copy link
Collaborator

@zkxuerb zkxuerb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ljedrz
Copy link
Collaborator

ljedrz commented Nov 5, 2024

@vicsn I updated the original branch with 2 commits addressing the notes on extra docs and the redundant use of eol. Feel free to cherry-pick those, they should apply cleanly.

@d0cd as for extra tests, note that str_till_eol is only ever used in parse_comment, and there are already several applicable test cases for it, including ones that would utilize the faster approach. Do you feel like those are missing any variants?

@d0cd
Copy link
Contributor

d0cd commented Nov 5, 2024

@ljedrz on initial scan, I don't see test case for a multi-line comment, multiple single line comments, multiple multi-line comments, and interleavings of the two. I'll leave it to your discretion, but I would at least recommend a multi-line comment and multiple single line comments.

I also saw that you use eoi in the update. Looks good to me, but I would also recommend explaining in a comment that this is valid because before is a single line string.

The reason I am being so insistent on comments is that these parsers are quite sensitive and not many people have a deep expertise in nom.

@ljedrz
Copy link
Collaborator

ljedrz commented Nov 6, 2024

@vics I addressed the comments in 2 new commits:

@zosorock zosorock added the v1.1.4 Canary release v1.1.4 label Nov 12, 2024
@zosorock zosorock merged commit 60a0aa5 into AleoNet:staging Nov 12, 2024
84 checks passed
@ljedrz
Copy link
Collaborator

ljedrz commented Nov 12, 2024

While not a big deal (extra test cases, docs, and one optimization), this PR was still missing the 4 extra commits mentioned at the end. I can include them in a follow-up shortly.

@vicsn
Copy link
Contributor Author

vicsn commented Nov 13, 2024

@ljedrz apologies I overlooked that I had to update my branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v1.1.4 Canary release v1.1.4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants