Understating ATN state changes #4370
-
Hi all! I use ANTLR4 to parse many input files in a large variety of languages and build custom tree representations from them. These trees require to have specific quantifier nodes being the parent of subtrees quantified with After investigating the generated parser, I've noticed that the ATN state changes encode very similar information that I need. Fortunately this information is available in the Python runtime by overriding the setter of the I have a minimal repro, printing the state changes and the transitions:
from antlr4 import InputStream, CommonTokenStream
from antlr4.atn.ATNState import ATNState
from LoopLexer import LoopLexer
from LoopParser import LoopParser
class MyParser(LoopParser):
@property
def state(self):
return self._stateNumber
@state.setter
def state(self, atnState: int):
self._stateNumber = atnState
state = self.atn.states[atnState]
t = ", ".join([f'{ATNState.serializationNames[transition.target.stateType]}({transition.target.stateNumber})' for transition in state.transitions])
print(f'{atnState:5} {self.ruleNames[state.ruleIndex]:5}: {ATNState.serializationNames[state.stateType]:20} [{t}]')
def main():
lexer = LoopLexer(InputStream("XXa"))
parser = MyParser(CommonTokenStream(lexer))
parser.start()
main() Output
In state 0, the possible next state is 7, which is indeed the second state. However, state 2 predicts that the follow-up states are either 5 ( I know that it is an internal API and it seems to work well. But I don't understand why are some state changes "missing" (especially that setting Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
It appears from your output that there are steps missing between NFA 15 to NFA 4. From NFA state 15, input "X", the transitions are NFA 15->16->3->(pop stack)->6->9->7->5->4. The interpreter likely works through this sequence before calling the code to set state number using closure() on the next NFA state. The transitions between 16 and 4 are all empty transitions, less the stack pop, and likely just ends up in NFA state 7. But because the next input is again another "X", and we are in the middle of "start", it's already cached a computed DFA state for rule "start" on "X" from NFA 7, the interpreter goes to NFA 4. The only way to really understand this code is to debug the calls to AdaptivePredict(). Note, another method that sometimes helps in understanding the interpreter is to turn on the "newish" parser trace option and try to follow along with the output. However, in this example, the trace is unrevealing. |
Beta Was this translation helpful? Give feedback.
It appears from your output that there are steps missing between NFA 15 to NFA 4. From NFA state 15, input "X", the transitions are NFA 15->16->3->(pop stack)->6->9->7->5->4. The interpreter likely works through this sequence before calling the code to set state number using closure() on the next NFA state. The transitions between 16 and 4 are all empty transitions, less the stack pop, and likely just ends up in NFA state 7. But because the next input is again another "X", and we are in the middle of "start", it's already cached a computed DFA state for rule "start" on "X" from NFA 7, the interpreter goes to NFA 4. The only way to really understand this code is to debug the calls to Adapt…