Keeping last token on error recovery #41

thehangedman · 2021-01-04T12:38:22Z

Hello,

I'm parsing a simple C-like language and I've got a problem with error recovery.

I provide ErrorRule for member declarations:
member_declaration.ErrorRule = SyntaxError + ";" | SyntaxError + "}"

But it doesn't work well with incomplete declarations like
{ int ; // missing name identifier }
or
{ int value // missing ; }

When parser meets ";" or "}" it starts error recovery with skipping the current ";" or "}" and looks for the next ";" or "}" instead (messing up the next declaration that ends with ";" if present, or consuming closing "}" and parsing the next top level declaration as the nested one).

It seems that it would be better if the parser wouldn't skip the current token (";" or "}" in the example above) but leave it as the current one when starting recovery - then recovery would immediately find ";" or "}" and consume it, finishing recovery successfully and not messing up further declarations (at least in the first example).

I tried to modify ErrorRecoveryParserAction the following way to test the idea:
var currentToken = context.CurrentToken; errorShiftAction.Execute(context); context.CurrentToken = currentToken;
and it feels to work well for tests above.

Or is there any better way to achieve the same result?

The text was updated successfully, but these errors were encountered:

rivantsov · 2021-01-16T00:11:02Z

you are right, error recovery is not perfect, there can be better ways to do it, like the one you suggest - maybe. You just need to verify that it works better not only for this particular case, but also for other cases in your language, for other languages and (!!!) for other 'typical' typing mistakes. I implemented error recovery based on primitive recovery methods for LALR suggested years and years ago, in all these dragon books.
The whole framework was actually built following classic books on compilers, that compiled source files on disks. The error recovery was beneficial to 'try' to recover to some consistent state to continue discover more errors; the process already failed, the only goal now was to find more errors if any, to report to programmer all errors in printout when he GETS IT IN THE MORNING from the big machine (that were the days, programs in queues to CPU on mainframes -) ).
Nowadays, we have IDEs, and compilers are heavily used in interactive editors, check syntax while user types. There are bigger challenges, we try to recover and continue discovering things down the file, not to just find more errors, but to recover 'good parts', so that intellisense continues working and showing like available methods. This is quite different from old 60s and 70s error printouts.
So in general, Irony's approach to recovery is primitive by today's standards; improving things like 'skipping better to known symbol' would not fix the general problem, it would require quite a different approach. Some day in the future
thank you
Roman

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keeping last token on error recovery #41

Keeping last token on error recovery #41

thehangedman commented Jan 4, 2021

rivantsov commented Jan 16, 2021

Keeping last token on error recovery #41

Keeping last token on error recovery #41

Comments

thehangedman commented Jan 4, 2021

rivantsov commented Jan 16, 2021