Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text.Parser: support for stateful grammars #4286

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

msmorgan
Copy link
Contributor

@msmorgan msmorgan commented Jan 8, 2018

This PR adds simple state support (state type is constant) to Text.Parser.

Grammar has been renamed GrammarT, and is additionally parameterized by a state type.
Grammar is now a type-level function producing a GrammarT type with unit state.

Introduces get, put, modify, and gets.
Introduces parseState, which parses according to a grammar and an initial state.

Items that need feedback:

  • Function/type names, argument order.
  • Is this advanced enough, or should something with pre- and postconditions such as Control.ST be investigated instead?
  • Does the Failure constructor of ParseResult really need to include a state? Update: State has been removed from the ParseResult.Failure constructor.
  • How to implement runGrammar : innerST -> GrammarT innerST tok c a -> GrammarT outerST tok c (a, innerST). Update: This is implemented, but the compiler hangs.

@msmorgan
Copy link
Contributor Author

msmorgan commented Jan 8, 2018

The latest update causes the compiler to hang when processing the doParse function.

I've added State directly to the Grammar type (now GrammarST), rather than adapting the StateT type to work with it. I attempted to do this, but I encountered problems with the productivity heuristic for totality and infinite types. Because >>= needs to unpack and repack the ST constructor of StateT, Idris cannot verify that applications of >>= are total as it can with Grammar. This works with Grammar alone, because >>= reduces to one of two constructors of Grammar, satisfying the productivity heuristic.

A fix to the above problem with StateT might be to add a new totality heuristic, the likes of "remains infinite in output". If the input to a function is wrapped in Inf, its evaluation must be wrapped in Inf in the output to that function. Such a rule might be enough for Idris to consider the code linked above to be total.

@msmorgan
Copy link
Contributor Author

It turns out Idris doesn't hang indefinitely, but it does take an additional 19 minutes on my machine to build contrib with these changes.

@ahmadsalim
Copy link

@msmorgan Hmm, that seems not nice. I know it is not your fault, but that is a significant slowdown.

@msmorgan
Copy link
Contributor Author

As an experiment, I removed the Run constructor and the code that supports it. The compile time went down drastically, now only 2 minutes longer than before this PR.

However, as Run is used in the implementation of lift, it seems too important to omit from this PR.

@ahmadsalim
Copy link

@msmorgan Any status on this PR?

Also, the tests are currently failing.

@msmorgan
Copy link
Contributor Author

The tests are failing because the builds are timing out, as I mentioned earlier. I'm not sure how to correct this. If I get the time, I could try one of the following approaches. Which do you think has the best chance of speeding up the compiler?

  • Refactor doParse by splitting it into several helper functions that are mutually recursive with doParse.
  • Refactor doParse by modifying the return type somehow.
  • Modify doParse to accept a Stream of input tokens instead of a List, and change the return type to be infinite. I think it's possible to retain totality using this approach, but I don't know if it would be possible to make a total finite interface to this function without assert_total. For example, if this function takes a List tok argument, it could pass a Stream (Maybe tok) to the new, infinite doParse. The problem would arise in the portion that processes the potentially-infinite (even though we know it isn't) result.

The third option doesn't sounds like a very good idea, now that I've written it out.

@ahmadsalim
Copy link

@msmorgan Have you considered doing partial evaluation on the input grammar?

@msmorgan
Copy link
Contributor Author

@ahmadsalim I have not, nor do I know what it is.

@ahmadsalim
Copy link

@msmorgan You can annotate a term with %static if you want Idris to partially evaluate it.

Partial evaluation tries to specialize the code for that argument, and hopefully generate more efficient code 😄

@msmorgan
Copy link
Contributor Author

I tried annotating various arguments to doParse with %static, but observed no improvement in compile time. If I understand the documentation page, the best code generation improvement would come from annotating the grammar argument, (act : GrammarT st tok c ty).

I built Idris with profiling enabled, removed the %static annotations and profiled idris --check Text/Parser/Core.idr. The report indicates that compiler spends the majority of the time in the totality checker, namely the inner function Idris.Totality.checkMP.tryPath. It also appears that a large amount of time is used comparing Idris.Core.TT.Name objects, using the derived Eq.(==) and Ord.compare functions. Perhaps comparison on these objects could be improved via memoization, hashing, or both; or maybe the compiler is sorting Name objects much too frequently for whatever reason.

I've uploaded the full profiling dump (which weighs in at 118 MB) as a 5.5 MB zip archive here.

`Grammar` is now a type-level function producing a grammar with unit
state.
Also, prefer constructors to exported functions in `modify` and `gets`.
This allows a grammar with a different state type to be run
in the current input stream context.

Problematically, this causes the compiler to hang while processing `doParse`.
This was never used.
@ahmadsalim
Copy link

@msmorgan Maybe integrating https://hackage.haskell.org/package/intern in Eq and compare would be good? 😄

@msmorgan
Copy link
Contributor Author

The issue seems to be one of asymptotic complexity, causing this function to be called hundreds of thousands of times while compiling doParse. Speeding up name lookup would be helpful but wouldn't solve the problem.

@msmorgan
Copy link
Contributor Author

I ran profiling for this compile and have uploaded the results here. Be warned: the uncompressed file is enormous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants