Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: separate parsing and output generation via events #11

Open
tomtau opened this issue Jun 3, 2024 · 5 comments
Open

Refactor: separate parsing and output generation via events #11

tomtau opened this issue Jun 3, 2024 · 5 comments
Labels
help wanted Extra attention is needed

Comments

@tomtau
Copy link
Contributor

tomtau commented Jun 3, 2024

pest-parser/pest#885 (reply in thread)

@tomtau tomtau mentioned this issue Jun 3, 2024
@tomtau tomtau moved this to Todo in pest3 near-term Jun 3, 2024
@tomtau tomtau added the help wanted Extra attention is needed label Jun 3, 2024
@tomtau
Copy link
Contributor Author

tomtau commented Jun 15, 2024

For the typed API also:

rules should be somehow configurable in case someone wants to derive to parsers in the same module

@Tartasprint
Copy link
Contributor

I glanced over quick-xml's event API, since it was mentioned in pest-parser/pest#885 (reply in thread), and it looks nice !

A question I have is what kind of events are intended for user to have ?
I would guess non silent rules, and node tags (but looking at the pest3 grammar of grammars it seems they were abandonned).

I'll now try to see if there can be ambiguity with that choice from the event listener.
An dumb exemple is a rule containing a choice with the two sides having a common prefix, such as:

dumb_choice = other_rule - "a" | other_rule - "b"

When parsing dumb_choice the listener will receive an other_rule event, but won't know if it is starting left side or the right side of the choice expression. Yet it doesn't matter that much, since that prefix could be factored out of the choice.

A thing I noticed in quick-xml is that for rules that are nested there is a start event and a stop event. So for rules like:

operator = "+" | "-" | "/" | "*"

isolated events would be fired, named operator; and for rules like:

expression = "[" ~ [..omitted..] ~ operator ~ [..omitted..] ~ "]"

would fire a sequence similar to "start expression ....operator....end expression".

But what about a recursive rule like:

separated = "0".."9" - ("," - separated)?

Here applying the previous suggestion looks horrible to me. Also the behaviour is not well defined since the rule has some times element nesting in it and others not.

I will continue this analysis/questions about what decisions have been made later.

@tomtau
Copy link
Contributor Author

tomtau commented Oct 19, 2024

I glanced over quick-xml's event API, since it was mentioned in pest-parser/pest#885 (reply in thread), and it looks nice !

Yes, it's the closest to what I had in mind. There's also PEGTL:
https://github.com/taocpp/PEGTL/blob/main/doc/Contrib-and-Examples.md#examples
https://github.com/taocpp/PEGTL/tree/main/src/example/pegtl

which looks interesting, but it's a bit different.

A question I have is what kind of events are intended for user to have ?

Probably rule-level events if the rule matched (but I guess for a debugger etc., it'd probably need other events)

I would guess non silent rules, and node tags (but looking at the pest3 grammar of grammars it seems they were abandonned).

Those weren't sure whether to include them, but some thoughts:

  1. with meta-rules without parameters, there is an overlap with silent rules, but I guess the distinction can be that meta-rules will be expanded in the parse tree / won't produce events on their own, while silent rules will (which the event processor can ignore and not add them to AST if it chooses to, but it's up to each use case implementation).
  2. tags aren't included yet, because it wasn't sure whether they'll be needed with the current typed AST. But @oovm had some more ideas for it Restoration of the pest3 work effort 🙌 pest#885 (comment) and I can imagine e.g. group tags could be useful for user events. So I assume we can add tags to pest3's meta-grammar?

When parsing dumb_choice the listener will receive an other_rule event, but won't know if it is starting left side or the right side of the choice expression. Yet it doesn't matter that much, since that prefix could be factored out of the choice.

I guess for unlabelled choice branches, the event could include the branch index?

Here applying the previous suggestion looks horrible to me. Also the behaviour is not well defined since the rule has some times element nesting in it and others not.

Why does it look horrible? In XML, one may also have nested recursive tags and the events will be fired in that order (I think?).

Anyway, this is all open to discussion and implementation on what would make most sense (I haven't thought in detail about it, quick-xml-like API looked like it could work, but there may be instances where it's not nice which didn't occur to me).

@Tartasprint
Copy link
Contributor

Yes, it's the closest to what I had in mind. There's also PEGTL:

Reading their Getting Started page it looks like they have a tracer, which could be similar to events. But I don't remember enough about C++ to understand what is going on in there.

Probably rule-level events if the rule matched (but I guess for a debugger etc., it'd probably need other events)

Agree. Maybe make a low-level tracer for debugging like in PEGTL. Maybe even such a tracer could be the main event generator.

I don't think an event generator needs to handle multiple listeners at the same time, so usage could look like:

// For regular user:
for event in event_generator.clone() {
    ...
}

// For low level stuff:
for event in event_generator.clone().low_level() {
    ...
}

// For a step by step VM
let event = event_generator.next()
let low_level = event_generator.new_low_level()

Doing things this way would basically turn the parser in to an iterable pest-VM. The regular iteration yielding highlevel events, and the low level iteration yielding more details like every attempt/failure.

So I assume we can add tags to pest3's meta-grammar?

I am not sure about that. Indeed it would be useful to add such tags for generating events, but maybe if the event system is done right they will be just boiler plate ? the examples given by oovm show tags indicating the start/end of rules (from what I understood, I maybe wrong). I hope pest users won't need to add those things.

the event could include the branch index?
The branch tags could be useful though, to make this more user friendly.

Why does it look horrible?

Rewriting that rule like that:

separated = "0".."9" - (separation - separated)?
separation = ","

is horrible because to the listener of things it will look like;

start separated
"9"
separation
start separated
"8"
separation
start separated
....
"1"
stop separated
stop separated
...
stop separated

It would be nicer if it looked like:

start separated
"9"
separation
"8"
separation
...
"1"
stop separation

How to get there ? I don't know 😅.

Would it be worth that I try now experimenting with a low level "tracer" ? From there, if it went well, it would be possible to experiment with higher level events.

@tomtau
Copy link
Contributor Author

tomtau commented Jan 19, 2025

@Tartasprint @TheVeryDarkness one possible alternative output format would be to process parsing into cstree -- so that could be one potential concrete use case to motivate this refactoring that one could choose between the outputting the current typed API and cstree.

@Tartasprint for that event "separation" vs "separated", maybe it could pass some kind of call depth/trace?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
Status: Todo
Development

No branches or pull requests

2 participants