Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The future of parsing TLA+ #16

Open
ahelwer opened this issue Dec 12, 2024 · 12 comments
Open

The future of parsing TLA+ #16

ahelwer opened this issue Dec 12, 2024 · 12 comments

Comments

@ahelwer
Copy link

ahelwer commented Dec 12, 2024

Opening this discussion after the December community meeting. This regards future efforts to either unify the various TLA+ parsers, or not. I try to give a fair appraisal of the options but I do have my preferences, since I will probably be the one doing this work!

The Problem

Currently there are two main parsers used by TLA+ language tooling:

  1. SANY, a Java-based parser used by TLC
  2. The TLAPM parser, an OCaml-based parser used by TLAPM

We are faced with three possible choices:

  1. Keep things as they are and continue investing features & fixes into both parsers, supported by development of an implementation-independent suite of TLA+ parser tests similar to the existing syntax corpus; the largest remaining feature here is writing a level checker for TLAPM (basically a simple type checker).
  2. Discard TLAPM's parser and switch TLAPM to use SANY, so as to consolidate developer time; to avoid foreign function interface (FFI) weirdness between OCaml and Java, from TLAPM we would shell out to start a Java VM instance, run SANY, consume the parse output from its XML interface, parse the XML into equivalent datastructures in OCaml, and translate those datastructures into TLAPM's existing internal parse tree format.
  3. Adopt a greenfield approach of writing a third TLA+ parser from scratch, probably in rust; then, transition both TLC and TLAPM to use this new parser. This would have to be motivated by substantial new capabilities, discussed below.

Comparison of SANY and TLAPM parsers

Of the two, SANY is more fully-featured. While it does contain bugs, it accepts all important TLA+ syntax and implements proper semantic analysis and level-checking. In contrast, TLAPM's parser does not accept some TLA+ syntax and does not perform level-checking; also, it is the author's understanding that instead of implementing semantic analysis as a graph of references overlaid on the parse tree, TLAPM simply rewrites/inlines all references to form larger expressions. While this rewriting approach does fit with TLAPM's purpose of rewriting TLA+ syntax into proof obligations in other languages, it has drawbacks; for example, it does not support RECURSIVE definitions, since such an expansion would have no end.

It isn't all downside for the TLAPM parser though, and this is really due to OCaml's language features. OCaml has the ability to express variants, where an object might be one of several possible types with their own named & typed fields; it is then possible to exhaustively pattern-match over the type in a switch statement (many programmers will have been introduced to this concept from rust). This comes in especially useful in parsers where a given syntax tree node is one of several possible kinds and we want to handle them all differently. Java in contrast does not have this ability, and so consuming the parse tree usually requires non-exhaustively switching on an enum then casting a value to a subclass. This section should not be discounted as mere programmer language preference; in practical terms, consuming and manipulating the parse tree emitted by SANY is very difficult and requires a lot of implicit & undocumented knowledge about the order that elements are supposed to occur (or not) in variable-length arrays. Usually the only way to tell is to run a debugger and see what the parser spits out.

Benefits & drawbacks of switching TLAPM to use SANY

The primary benefit to making this switch would be reduced development burden. While parsers are a rare breed of software project where the requirements are so well-understood that they are actually possible to complete, there is a very large amount of work remaining to get both parsers to that point. This work is made more difficult by the necessity for multiple developers to understand the parsers in great depth, so as to review each others' work. While this can be worked around by developing very large test suites - to reduce reviewer trepidation when approving changes in unfamiliar parts of the codebase - it remains the case that developers don't like to sign off on changes they don't understand and so PRs tend to languish for one or more months. It is the author's experience that iteration time is the single greatest factor affecting overall development time, and we can reasonably expect a rate of 1-3 changes/month to result in development time of five years or more for a substantial project.

It is the author's opinion that the primary difficulty in transitioning TLAPM to use SANY's parse output is that no developer active with the project has full living knowledge of how both parsers represent the semantic parse tree, and this project requires us to write a translator from one semantic parse tree to another while preserving a large number of unknown invariants. Thus this project cannot be attacked directly; instead, various plausibly-valuable sub-projects must be worked on to cultivate a theory of how both these codebases work in some developer's head. Thankfully, one such sub-project presents itself: development of a large semantic test corpus, similar to the existing syntactic test corpus. This corpus has obvious value on its own and will also be useful in testing a SANY-to-TLAPM parse tree translator.

One illustrative example of the theory shortfall & a clear unknown is the conceptual equivalence of TLAPM's semantic parse tree to SANY's. As mentioned above, TLAPM takes a rewrite-based approach to semantic resolution. Does this present an obstacle to translating SANY's graph-of-references overlay approach? Does this mean there will need to be additional development on the TLAPM parser before the translation can even work? I don't know.

One final drawback to switching TLAPM to use SANY is the overall complexity of the proposed pipeline. While TLAPM is no stranger to calling out to other programs and parsing their output - this is the approach it takes with all its backend theorem provers - replacing an in-codebase in-language parser with a "run SANY/parse XML/translate datastructures" pipeline does seem to add fragility, as well as an additional dependency on the Java runtime (although users of TLAPM are almost certainly already users of Java-based TLC). This also adds additional time to the parse loop. While it is not possible to appeal to objective criteria here, it is the author's opinion that programmers prize snappy language tooling and waiting on the order of an entire second to parse a few hundred lines of TLA+ should not be viewed as acceptable. This is especially true in the new(ish) domain of language servers, where live updates are expected as users type in their editor. As a somewhat niche research-adjacent language we can get away with some long processing times, but I do think we should aim higher here.

Writing a new parser?

I don't forward this as a very serious proposal at this point. Perhaps in half a decade we will have reached the real limit of what we can do with the existing parser(s) and will want to look into this. I took a brief stab last year at writing a new parser in rust but was ultimately unhappy with the idea of just writing another recursive descent parser and it fizzled out. I'm currently working through a university course on formal languages to hopefully exactly characterize what power/variant of automaton is needed to parse TLA+ and what efficient algorithms exists for doing so. Who knows, maybe a TLA+ parser written in TLA+ will come into existence, if only as part of the language standard!

Any new parser would have to be motivated by a substantial new capability that the existing parsers cannot be modified to possess. Something like structured editing and incremental parsing, with error recovery enhanced by running an algorithm to diff edits against prior successfully-parsed trees. Complicated stuff like that. Although really, I think the killer feature would have to be a finite model-checker rewrite to use compiled bytecode for a huge boost in throughput, which might be better to do in a language other than Java. All speculative at this point.

Conclusion

This was long, but also our discussion in the community meeting was long. Please leave any comments or feedback below. Hopefully we can all read this thread and so shorten our discussion at the next community meeting, or establish a parser working group.

@fhackett
Copy link

Author of multiple "non major" TLA+ parsers here... most notably PGo, but by now it's almost my hobby to try and write more of them as parser tooling stress tests. Thanks for the write-up, especially given I did not make it to the aforementioned discussion.

I actually didn't know the SANY XML exporter existed. That's interesting. Regardless of parser implementation standardization, I think that a common representation of how a spec is "understood" is valuable for experimenting and qualifying what is or is not a correct parse. I should go learn more about this and incorporate it into my testing process.

That is to say, I think any effort in making parsing more consistent and better defined will also benefit TLA+ tooling that is neither SANY nor TLAPM. 👀

@thesammiller
Copy link

Commenting since there was the request, but I am not an expert on parsing so this is mostly a beginner/user perspective.

Personally, I think the second option is good. My main interest in this would be for the XML format to gain support for further transformations, as some third-party tooling might support SMXML (state machine XML) and this could be a great way to expand interest in the project.

I don't think there's anything currently written in rust and I would be careful about adding new languages to the codebase, as that would provide additional overhead to anyone who wanted to read/understand in the future.

@Calvin-L
Copy link

I don't think there's anything currently written in rust and I would be careful about adding new languages to the codebase, as that would provide additional overhead to anyone who wanted to read/understand in the future.

As much as I do love Rust, I completely agree with this point.

I would also vote for option 2, or for option 2.1 ("use SANY but find a better interchange format than XML"), in large part because I also agree that

a common representation of how a spec is "understood" is valuable

Indeed, I think there's a bit more to this puzzle than parsing. SANY does at least four other important jobs:

  1. Has a copy of the definitive standard library.
  2. Translates module names into filenames and reads and parses the whole dependency tree.
  3. Links uses of a name to their definition.
  4. Does important semantic checks (missing definitions, operator arity checking, level checking, etc).

Anyone building tooling for TLA+ will benefit from some or all of those---and anyone using TLA+ tooling will benefit from tools that agree on all 4 points. The one time I wrote some TLA+ tooling I benefited immensely from points 2 and 3 in particular.

@Calvin-L
Copy link

Calvin-L commented Dec 21, 2024

To respond to a few other specific points in @ahelwer's fantastic analysis:

programmers prize snappy language tooling and waiting on the order of an entire second to parse a few hundred lines of TLA+ should not be viewed as acceptable

I agree. We need some numbers to make a decision though. JVM startup time is notoriously long, but it has also improved drastically in recent years. New tools like jlink and graal may give us some ways to avoid any remaining overhead. SANY's runtime should be very quick, even starting in a cold JIT---but I say "should be" because I have never seen it measured. Some ballpark numbers on a laptop would help us decide if there's any hope of getting a SANY run down to a few milliseconds.

^ I'll try to make an afternoon project out of this over the winter holidays (if nobody beats me to it).

Java in contrast does not have [variants]

It does. (Sorry to be a language pedant. 🤣 I don't love Java, but we should be maligning it for other reasons, not this one.) The feature has been added slowly, but all the major pieces (including pattern matching!) are available and stable as of Java 21.

My real point is that we could write a very nice wrapper library in Java 21 (or Rust or OCaml or...) that invokes SANY, parses its output, and returns a pleasant in-memory representation that is fairly easy to consume for toolwriters. The limitations of SANY's implementation language don't matter once you have a stable interchange format.

the primary difficulty in transitioning TLAPM to use SANY's parse output is that no developer active with the project has full living knowledge of how both parsers represent the semantic parse tree, and this project requires us to write a translator from one semantic parse tree to another

Agreed. Further complicating the issue: there is no documentation of SANY's XML format (which is a somewhat direct translation of SANY's internal datastructures, which might not be exactly the right thing to spit out in the first place).

If we do go down the switch-to-SANY route, a large and important part of the project will be simply documenting the interchange format and keeping it stable between releases. I don't know how to estimate that work, but I expect it is substantial.

@ahelwer
Copy link
Author

ahelwer commented Dec 21, 2024

I would love if we could upgrade to a newer version of Java and take advantage of some new language features, especially variants & exhaustive pattern-matching. I recall in tlaplus/tlaplus#951 @lemmy mentioned we should only really upgrade to LTS releases; Java 21 which has variants is LTS, and the next one after that (which includes the Java 22 FFM API necessary for the linked PR) is Java 25 releasing in September 2025.

@lemmy
Copy link
Member

lemmy commented Dec 22, 2024

According to our TLA+ execution statistics, the majority of our users are using Java 11 or Java 17.

tlaplus=# SELECT substring(jvm_version from '^[0-9]+') AS main_version, COUNT(*) AS entry_count FROM execution_description WHERE execution_timestamp >= NOW() - INTERVAL '12 months' GROUP BY main_version  ORDER BY main_version;
 main_version | entry_count
--------------+-------------
 1            |         627
 11           |      226.397
 13           |        4.303
 14           |       51.214
 15           |           4
 16           |         597
 17           |      211.988
 18           |       49.619
 19           |        3472
 20           |         631
 21           |       24.226
 22           |       37.580
 23           |       57.839

@lemmy
Copy link
Member

lemmy commented Dec 24, 2024

Comparison of SANY and TLAPM parsers

Of the two, SANY is more fully-featured. While it does contain bugs, it accepts all important TLA+ syntax and implements proper semantic analysis and level-checking. In contrast, TLAPM's parser does not accept some TLA+ syntax and does not perform level-checking; also, it is the author's understanding that instead of implementing semantic analysis as a graph of references overlaid on the parse tree, TLAPM simply rewrites/inlines all references to form larger expressions. While this rewriting approach does fit with TLAPM's purpose of rewriting TLA+ syntax into proof obligations in other languages, it has drawbacks; for example, it does not support RECURSIVE definitions, since such an expansion would have no end.

FWIW: The lists of known TLAPS parser and semantic checker bugs can be found at https://github.com/tlaplus/tlapm/issues?q=is%3Aissue+is%3Aopen+label%3A%22syntax+parser%22 and https://github.com/tlaplus/tlapm/issues?q=is%3Aissue+is%3Aopen+label%3A%22semantic+checker%22.

Currently, there are 25 known bugs in SANY compared to 23 known bugs in TLAPS' parser and semantic checker (16 in the parser and 7 in the semantic checker). Note that the TLAPS parser and semantic checker implement only a subset of the features available in SANY. Moreover, tlaplus/tlapm#186 discusses a severe performance bottleneck in TLAPS' level checking.

@kape1395
Copy link

From my experience implementing the LSP server for TLAPM, the important differentiator (apart from the correctness) for deciding on the parser is its speed, including the incremental parsing and integration with it. The LSP updates the document with almost every key press (modulus short debounce period); thus, the document is parsed several times per second.

When starting to work on the LSP, I considered using the SANY parser, but integration looked a bit heavy. E.g., java startup times, serialization/deserialization. In my case, it looked like I would need to use the TLAPM functions anyway, like extracting proof obligations.

@Calvin-L
Copy link

Regarding SANY's performance, here are a few ballpark numbers taken on my laptop. They show time to parse and check the TLA+ benchmark models. This does not include exporting the parsed information anywhere, just enough work to spit out "yes" or "no". The --help row is the time SANY takes to print its help info, which I thought might be a good rough measure of JVM startup time.

spec                           | avg time over 10 runs (ms)
------------------------------------------------------------
--help                         | 100
Bloemen/MC.tla                 | 351
Bookkeeper/MC.tla              | 340
EWD840/MC.tla                  | 262
Ghostferry/MC.tla              | 345
Grid5k/MC.tla                  | 240
MongoRepl/MC.tla               | 274
MongoRepl/RaftMongo.tla        | 268
PaxosCommit/MC.tla             | 301
PaxosMadeSimple/MC.tla         | 310
SwarmKit/MC.tla                | 319

So:

  • Today, we might expect SANY to take about 300ms to parse a spec.
  • Our best optimizations probably can't make SANY any faster than 100ms per parse, due to JVM startup costs.

These numbers could probably be drastically improved by implementing a SANY daemon that starts once and keeps the JVM running to parse many specs. I have no easy way to measure what that might look like.

Another option is to use Graal native images. I had to make a few small patches to SANY to get this to work, but here are the same numbers for a lightly-modified ahead-of-time-compiled version of SANY:

spec                           | avg time over 10 runs (ms)
------------------------------------------------------------
--help                         | 5
Bloemen/MC.tla                 | 27
Bookkeeper/MC.tla              | 31
EWD840/MC.tla                  | 16
Ghostferry/MC.tla              | 23
Grid5k/MC.tla                  | 13
MongoRepl/MC.tla               | 16
MongoRepl/RaftMongo.tla        | 13
PaxosCommit/MC.tla             | 18
PaxosMadeSimple/MC.tla         | 19
SwarmKit/MC.tla                | 21

These numbers look very good to me: <10ms startup overhead, <50ms per parse. We could absolutely build tooling around something that fast.

@lemmy
Copy link
Member

lemmy commented Dec 29, 2024

@Calvin-L Can you please also collect numbers that include the XML serialization/export?

@Calvin-L
Copy link

Calvin-L commented Jan 1, 2025

For java tla2sany.xml.XMLExporter -t -o, piping output to /dev/null:

spec                           | avg time over 10 runs (ms)
------------------------------------------------------------
Bloemen/MC.tla                 | 395
Bookkeeper/MC.tla              | 405
EWD840/MC.tla                  | 301
Ghostferry/MC.tla              | 361
Grid5k/MC.tla                  | 265
MongoRepl/MC.tla               | 296
MongoRepl/RaftMongo.tla        | 273
PaxosCommit/MC.tla             | 323
PaxosMadeSimple/MC.tla         | 335
SwarmKit/MC.tla                | 344

...and for a Graal native image:

spec                           | avg time over 10 runs (ms)
------------------------------------------------------------
Bloemen/MC.tla                 | 46
Bookkeeper/MC.tla              | 64
EWD840/MC.tla                  | 28
Ghostferry/MC.tla              | 38
Grid5k/MC.tla                  | 25
MongoRepl/MC.tla               | 29
MongoRepl/RaftMongo.tla        | 24
PaxosCommit/MC.tla             | 34
PaxosMadeSimple/MC.tla         | 33
SwarmKit/MC.tla                | 40
measure.tla                    | 24
stats.tla                      | 24

(And for posterity, these numbers and my earlier ones were collected on commit 40b92d0e).

@kape1395
Copy link

kape1395 commented Jan 2, 2025

The spec, including proofs I'm working on now, takes

time ( java -cp ~/.vscode/extensions/tlaplus.vscode-ide-2025.1.20252/tools/tla2tools.jar tla2sany.SANY some.tla > /dev/null )
real    0m0,727s
user    0m1,711s
sys     0m0,286s

It parses 3278 lines from 16 files. Those include 6 files with 1779 lines written by me. Others are from the standard modules and the tlapm standard library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants