-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment: the new parser based on winnow
#283
base: main
Are you sure you want to change the base?
Conversation
Performance Benchmark Report
Benchmarks removed:
Benchmarks added:
|
This sounds really promising! I haven't looked at your changes yet, but the performance speed-up would be very welcome if we can (over time) get to parity with what we've got working so far and if we feel good about its maintainability moving forward. I'm keen to read up on winnow and learn more. Have you found you've need to make significant changes to the AST structs so far? I've seen other projects successfully experiment with alternate parsers (behind a feature flag) and then be able to A/B compare them, and if they find it's the right call, flip over the default when ready. |
There are no AST changes. The current ast maps perfectly to the yacc posix grammar description. So the second parser is a drop-in replacement except the tokenizer call |
I found it easy to resolve issues with this new parser such as that one we had with heredocs before. Each part of the grammar is completely separated from the rest in its own function similar to peg's |
Moving to a true streaming parser will help other issues we've had (e.g., shopt actually changing tokenizing/parsing semantics for subsequent commands by enabling/disabling extglob or similar). |
I haven't looked through the full details of this draft but am supportive of us moving forward with adding this as an experimental secondary parser. That will allow us to incrementally mature it and bring it to full parity, get testing set up in a way that we're happy with, etc. -- and then we could figure out the criteria for switching the default parser. @39555 -- what do you think about preparing a separate small (first step) PR to enable a command-line option to select an alternate parser, and then just add a stub version of the new parser, and a basic test for it -- without adding all the real implementation? That will allow us to make sure we find all the places that code is directly calling into the parser and tokenizer today, and clean that up so it will be easier to switch it. |
winnow
is the rethought, user friendly fork ofnom
. I played with it for some time implementing the bash grammar and I think you might find it interesting to see how it looks!I managed to build a solid zero-copy (except for escaping where I'm using
Cow
) parser. It already has a 2x speedup over the current uncached PEG parser and performs on par with the cached version. Basically it is a hand-rolled parser built fromwinnow
building blocks compared to a framework or a parser generator.Its advantages:
An example of the trace
It allows for more imperative code to control special cases such as here docs.
You can use whatever error type you want. By default it uses zero copy
ContextError
with the span and message information. You can attach custom error messages withparser.context("foo")
. The author of thekdl
andmiette
usesmiette
in the kdl parser https://github.com/kdl-org/kdl-rs/blob/05a4c4fce1a25727e15f4e2f873d0e0ca076c328/src/v2_parser.rs#L67ariadne
ormiette
small amount of allocations due to the usage of
&mut
everywhere.I decided to merge tokenizing and parsing into one step. It has significantly reduced the amount of code compared to the old tokenizer things.
Things that are not fully implemented:
extended_tests
and basically any kind of expression. Im currently working on PR for supporting expression parsing in winnow Pratt parsing support winnow-rs/winnow#131