Skip to content

ANTLR4 parser

James Palmer edited this page Apr 1, 2021 · 5 revisions

Version 2.1.6 introduced the first beta version of the ANTLR4 parser. The existing parser (Proparse) was written in the late 2000's, using ANTLR2, whose latest version was released in 2006. ANTLR2 was a good choice at this time, but it's now time consuming and difficult to maintain this grammar. The new ANTLR4 grammar brings faster parse time, easier maintenance but the best part is that it is easier to enrich the parse tree. A side benefit is that it is possible to use the grammar in different languages (C#, Go, Javascript) with a minimal amount of code adaptation (the lexer part).

This rewrite required a lot of work, and we'd like to verify if there are some blocking issues at this stage of development. The idea is just to verify if the parse tree generated by ANTLR2 and ANTLR4 are identical, and if execution time is relatively consistent. In order to do that, we'd like you to add sonar.oe.antlr4=true and sonar.oe.antlr4.profiler=true to the set of properties, so that the ANTLR4 parser will be triggered, then send us the result by mail.

First part about parser execution time:

     [java] INFO: 21185 files proparse'd, 5350 XML files, 17019 listing files, 9 failure(s), 1695451 NCLOCs
     [java] INFO: AST Generation | time=532435 ms
     [java] INFO: XML Parsing    | time=55699 ms
     [java] INFO: AST4Generation | time=162096 ms
     [java] INFO: AST4Tree       | time=15077 ms

     [java] INFO: ANTRL4 - 25 longest rules
     [java] INFO: Rule 72 - field | time=14727 ms
     [java] INFO: Rule 51 - expression | time=9645 ms
     [java] INFO: Rule 56 - exprt2 | time=5689 ms
     [java] INFO: Rule 59 - attr_colon | time=3946 ms
     [java] INFO: Rule 3 - blockorstate | time=3568 ms
     [java] INFO: Rule 78 - inuic | time=2107 ms
     [java] INFO: Rule 52 - exprt | time=2079 ms
     [java] INFO: Rule 63 - gwidget | time=882 ms
     ...

     [java] INFO: ANTRL4 - 25 Max lookeahead rules
     [java] INFO: Rule 72 - field | Max lookahead: 13897
     [java] INFO: Rule 406 - defineproperty_accessor | Max lookahead: 6942
     [java] INFO: Rule 511 - extentphrase | Max lookahead: 1869
     [java] INFO: Rule 823 - skipphrase | Max lookahead: 1244
     [java] INFO: Rule 394 - defineparam_var | Max lookahead: 1199
     [java] INFO: Rule 3 - blockorstate | Max lookahead: 1168
     [java] INFO: Rule 59 - attr_colon | Max lookahead: 1047
     ...

Second part is about the outcome of the parser. You'll find two directories antlr2 and antlr4 in the .proparse directory. Please use your favorite diff tool (I'm using Beyond Compare) to compare the content of both directories, then report the differences. Those files are more or less anonymous, however, if you don't want to send the content of the files, just report the most common differences. An example from a real codebase is:

This difference appears in 50 different files, but that's always the same problem. Please also report ANTLR4 parse failures in the log, and the code snippet leading to the failure if possible.