Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EOF symbol to match end of input #237

Open
erezsh opened this issue Sep 18, 2018 · 17 comments
Open

Add EOF symbol to match end of input #237

erezsh opened this issue Sep 18, 2018 · 17 comments

Comments

@erezsh
Copy link
Member

erezsh commented Sep 18, 2018

Can be signified with $

Must be optional (grammars don't have to contain it to be correct)

Need to make sure it works for both LALR and Earley, and with indentation.

@isidentical
Copy link

Any update?

@erezsh
Copy link
Member Author

erezsh commented Jul 4, 2019

I have a version that should be working for LALR, in branch end_symbol

If anyone wants to test it and let me know how it goes, that would be great!

(that's https://github.com/lark-parser/lark/tree/end_symbol)

@isidentical @jruales @coreyt

@jisaacstone
Copy link

I did a rebase of this onto the master branch. There was a conflict in lalr_parser.py that I'm not sure I resolved correctly.

https://github.com/jisaacstone/lark/blob/end_symbol/tests/test_parser.py#L1650-L1660

I but all the tests seem to pass. To be sure I added another test in and it failed

https://github.com/jisaacstone/lark/blob/end_symbol/tests/test_parser.py#L1650-L1660

======================================================================
ERROR: test_end_symbol2 (tests.test_parser._make_parser_test.<locals>._TestParser)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/larpy/lark/tests/test_parser.py", line 1659, in test_end_symbol2
    self.assertEqual(parser.parse('axa'), Tree('start', [Tree('a', []),Tree('a', [])]))
  File "/Users/larpy/lark/lark/lark.py", line 293, in parse
    return self.parser.parse(text, start=start)
  File "/Users/larpy/lark/lark/parser_frontends.py", line 88, in parse
    return self._parse(token_stream, start)
  File "/Users/larpy/lark/lark/parser_frontends.py", line 54, in _parse
    return self.parser.parse(input, start, *args)
  File "/Users/larpy/lark/lark/parsers/lalr_parser.py", line 36, in parse
    return self.parser.parse(*args)
  File "/Users/larpy/lark/lark/parsers/lalr_parser.py", line 110, in parse
    reduce(arg)
  File "/Users/larpy/lark/lark/parsers/lalr_parser.py", line 77, in reduce
    value = self.callbacks[rule](s)
  File "/Users/larpy/lark/lark/parse_tree_builder.py", line 28, in __call__
    res = self.node_builder(children)
  File "/Users/larpy/lark/lark/parse_tree_builder.py", line 121, in __call__
    filtered.append(children[i])
IndexError: list index out of range

----------------------------------------------------------------------

I need to dive deep to get a better understanding of what is going on here exactly, but meanwhile does anything look off about the code I linked or the test I wrote?

@erezsh
Copy link
Member Author

erezsh commented Jan 18, 2020

The test seems fine. I think the exception happens because some of the new preprocessing code isn't aware that $ isn't a real terminal (in the sense that it doesn't match any text)

I'll look into it soon, unless you'll manage to work it out sooner.

@jisaacstone
Copy link

I've done a bit more digging

2 things that may or may not be relevant

1st: above test fails with the same error on the original branch (before rebase)

2nd: Altering the test slightly produces a different error:

        @unittest.skipIf(PARSER!='lalr', "Using the end symbol currently works for LALR only")
        def test_end_symbol2(self):
            grammar = """
                start: (a|b)+
                a: "a" (E|ND)
                b: "b"
                E: $
                ND: "x"
            """
            parser = _Lark(grammar)

            self.assertEqual(parser.parse('axa'), Tree('start', [Tree('a', []),Tree('a', [])]))
            self.assertRaises(UnexpectedInput, parser.parse, 'ab')
======================================================================
FAIL: test_end_symbol2 (tests.test_parser._make_parser_test.<locals>._TestParser)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/larpy/lark/tests/test_parser.py", line 1659, in test_end_symbol2
    parser = _Lark(grammar)
  File "/Users/larpy/lark/tests/test_parser.py", line 535, in _Lark
    return Lark(grammar, lexer=lexer_class_or_name, parser=PARSER, propagate_positions=True, **kwargs)
  File "/Users/larpy/lark/lark/lark.py", line 206, in __init__
    self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start)
  File "/Users/larpy/lark/lark/load_grammar.py", line 494, in compile
    for name, (term_tree, priority) in term_defs if term_tree]
  File "/Users/larpy/lark/lark/load_grammar.py", line 494, in <listcomp>
    for name, (term_tree, priority) in term_defs if term_tree]
  File "/Users/larpy/lark/lark/lexer.py", line 69, in __init__
    assert isinstance(pattern, Pattern), pattern
AssertionError: None

----------------------------------------------------------------------

In the rebase case this fails in a different way, because an assert has been added

https://github.com/lark-parser/lark/blob/master/lark/load_grammar.py#L637

removing the assert and it fails in the same way.

Hope these findings are helpful. About to go vacation so probably won't get back to looking at this for a couple weeks at least

@erezsh
Copy link
Member Author

erezsh commented Jan 23, 2020

Hi @jisaacstone,

I just created a new branch, end_symbol3, which merges end_symbol with the most recent master.

I don't get the same exceptions as you. In fact, everything works perfectly for me.

I also added the two tests you outline here (renamed the 2nd version to test_end_symbol3, with a few corrections), and they all seem to pass.

Let me know if you think I missed something. Otherwise, all you have left to do, is to get it to work for Earley.

Enjoy your vacation!

@jisaacstone
Copy link

OK Looks like my original thought was correct and this section of code i did not resolve the conflicts correctly

https://github.com/jisaacstone/lark/blob/end_symbol/tests/test_parser.py#L1650-L1660

Should ought to have gone back to that when I hit trouble. Sorry for the misdirection

@jnwatson
Copy link

+1 for $ to represent end of string. I just ran into a need for this.

@erezsh
Copy link
Member Author

erezsh commented Jun 12, 2020

Just pushed this into master. You can now use $ in the LALR parser. (No Earley support yet)

Here's an example from the tests that shows what it does:

def test_end_symbol2(self):
    grammar = """
        start: (a|b)+
        a: "a" ("x"|$)
        b: "b"
    """
    parser = _Lark(grammar)

   self.assertEqual(parser.parse('axa'), Tree('start', [Tree('a', []),Tree('a', [])]))
   self.assertRaises(UnexpectedInput, parser.parse, 'ab')

@kneufeld
Copy link

kneufeld commented Nov 1, 2020

I've just checked master and I can't find test_end_symbol2 and the example grammar listed above doesn't import.

E           lark.exceptions.GrammarError: Unexpected input at line 2 column 13 in calculator/grammar.lark:
E
E           a: "a" ("x"|$)
E                       ^

lark/lark/load_grammar.py:890: GrammarError

I don't claim to be even remotely be an expert on parsing but not being able to match $ seems to make even the simplest of grammars unparseable.

For instance (simplified):

start: expr+

expr: expr OPER expr
      | SIGNED_NUMBER

OPER: "+" | "-" | "*" | "/"
parse("1-1") # Tree('start', [Tree('statement', [1.0]), Tree('statement', [-1.0])])

whereas start: (expr ("\n"|$))+ would actually pick expr oper expr

@kneufeld
Copy link

kneufeld commented Nov 1, 2020

Looks like you reverted in 3bee210 but you don't say why.

@MegaIng
Copy link
Member

MegaIng commented Nov 1, 2020

If I remember correctly, there were some problems with unexpected side effects.

I don't claim to be even remotely be an expert on parsing but not being able to match $ seems to make even the simplest of grammars unparseable.

Maybe, but your example is wrong:

start: expr ("\n" expr?)*

works.

@kneufeld
Copy link

kneufeld commented Nov 1, 2020

So it does... again, not an expert, but that's very non intuitive.

Thanks for the help!

@erezsh
Copy link
Member Author

erezsh commented Nov 1, 2020

@kneufeld I reverted it because it was buggy.

I agree it would be a nice feature, but I don't think it's a necessity.

@ThatXliner
Copy link
Contributor

Is there any update? I kind of need this. Or is there a workaround?

@erezsh
Copy link
Member Author

erezsh commented Apr 17, 2021

I'll see what I can do

@erezsh
Copy link
Member Author

erezsh commented Apr 18, 2021

@ThatXliner See PR #880

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants