Adding descriptions seems to mess up error reporting #302

pragdave · 2021-06-02T00:10:55Z

Given a simple grammar:

Myopic {
  Program  = ExpressionList
  ExpressionList  = Expression*
  Expression = BlockExpression   | number
  BlockExpression  = "block" "{"   ExpressionList "}" 
  number = digit+
}

I feed it a program with a deliberate error, and I get a beautiful error:

This is correct, the "?" is not a valid expression, and it is flagged as such.

However, if I add a description (to any of the rules),

Myopic {
  Program (p) = ExpressionList
  ExpressionList  = Expression*
  Expression = BlockExpression   | number
  BlockExpression  = "block" "{"   ExpressionList "}" 
  number = digit+
}

the error now gets shuffled up to the top-level:

Am I misunderstanding how this is supposed to work?

Cheers (and thanks for a very nice-looking project)

Dave Thomas
@pragdave

The text was updated successfully, but these errors were encountered:

alexwarth · 2021-06-02T02:04:34Z

Hi @pragdave,

When a rule R has a description, it "swallows up" any parse failures that happen down the call stack and instead reports errors using its description, at the position where R was applied. But if R succeeds, it will simply discard the swallowed-up failures that were backtracked over.

That's why you stopped seeing a helpful error message after you attached a description to Program. A Program is an ExpressionList, which in turn is zero or more Expressions. Your sample input block { 1 ? 2 } doesn't begin with a valid expression and so Program succeeds without consuming any input. (The empty string is a valid Program.)

So Program succeeded and discarded all of those failures that would have been helpful to see (}, digit, and block). Then Ohm expected to see the end of the input stream but instead it found block { 1 ? 2 }. And because of the failure-discarding semantics that I described above, Expected end of input is all we can say.

This is certainly not desirable here, and we're going to have to rethink the discard-failures-on-success thing. I don't have a solution in mind yet, but I'll think on it and have a chat with @pdubroy to see if we can come up with something that works.

Best,
Alex

p.s. Thanks for the kind words :)

pragdave · 2021-06-02T03:46:57Z

Wow: I didn't expect a response so soon, and such a detailed one at that.

I see the motivation for what it does, but as you know it doesn't help if the grammar can recurse from the lowest levels back up to the top, because all errors become global.

Let me bounce a half-formed idea off you. Rather than having errors get caught it the highest possible level, would it be possible to define a one-way fence (à la Prolog or Snobol). This would allow the parse to search forward through it, but would act as an error boundary during backtracking.

For example, in my block example, once I've seen the opening brace following block, I'm committed. There is no other possible parse except for a bunch of expressions followed by a closing brace.

In that case, any error inside that block (or, I guess, below that block) is local to the context of that error. An invalid expression means that the expression is wrong, and not that the overall block is wrong.

Clearly, this is something that depends on the grammar being written. But being able to prevent backtracking can both increase performance and also give better error contexts.

Maybe something like:

Myopic {
  Program  = ExpressionList
  ExpressionList  = Expression*
  Expression = BlockExpression   | number
  BlockExpression  = "block" "{"   |>   ExpressionList "}" 
  number = digit+
}

where |> is meant to show the one-way flow through the parse at that point.

I am a total bystander when it comes to parsing theory, and I'm sure there are a gazillion holes in this. But I'm hoping that perhaps it might trigger a neuron and lead to some interesting discussions.

Dave

pdubroy · 2021-07-04T10:11:22Z

@pragdave I think having some kind of cut operator could be a really interesting idea. The paper Packrat Parsers Can Handle Practical Grammars in Mostly Constant Space explores the automatic insertion of cut points as a performance enhancement, but maybe there are also some insights here that could help improve our error handling.

pdubroy · 2021-07-04T15:28:44Z

Thinking about this a bit more...a cut usually only prevents backtracking within its enclosing rule, but it seems like what @pragdave proposes would be more like a global cut. I wonder how this would work inside a lookahead expression.

I also wonder how this would work in the presence of grammar extension. It seems to me that you could only write add these cuts with a global knowledge of the whole grammar, which prevent other authors from extending the grammar in certain ways.

BTW, I also found Error reporting in Parsing Expression Grammars which is a nice summary of some different approaches for error handling in PEGs.

pragdave · 2021-07-05T14:27:56Z

The throw/catch labelled failures in that paper look ;like they'd be really useful. Dave

DavidSouther mentioned this issue Jul 17, 2022

Implement Throw and Label operations #387

Open

pdubroy mentioned this issue Feb 25, 2024

Syntax error handling? #466

Closed

novusnota mentioned this issue May 27, 2024

refactor(grammar): simplify grammatical categories tact-lang/tact#365

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding descriptions seems to mess up error reporting #302

Adding descriptions seems to mess up error reporting #302

pragdave commented Jun 2, 2021

alexwarth commented Jun 2, 2021

pragdave commented Jun 2, 2021

pdubroy commented Jul 4, 2021

pdubroy commented Jul 4, 2021

pragdave commented Jul 5, 2021 via email

Adding descriptions seems to mess up error reporting #302

Adding descriptions seems to mess up error reporting #302

Comments

pragdave commented Jun 2, 2021

alexwarth commented Jun 2, 2021

pragdave commented Jun 2, 2021

pdubroy commented Jul 4, 2021

pdubroy commented Jul 4, 2021

pragdave commented Jul 5, 2021 via email