-
Notifications
You must be signed in to change notification settings - Fork 1
Language Design, Parser References
@jaybosamiya (March 24th):
-
rust-analyzer
(https://github.com/rust-analyzer/rust-analyzer) A popular "IDE backend" language server for Rust- Parser can be found in
crates/parser/src/grammar/...
- Does its parsing via a library they call
rowan
(https://github.com/rust-analyzer/rowan) - Details about how stuff fits together: https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md
- A blogpost about "ungrammar" which allows for a clean (~600 lines including comments:
rust.ungram
) superset of concrete grammar against which the handwritten parser is matched to (where the handwritten parser resolves ambiguity, precedence, and such; and the "ungrammar" is easier to wrok with from a tree perspective): https://rust-analyzer.github.io/blog/2020/10/24/introducing-ungrammar.html
- Parser can be found in
- The main Rust parser (https://github.com/rust-lang/rust/tree/master/compiler/rustc_parse) is also hand-rolled
-
peg
(https://crates.io/crates/peg) is a macro-heavy Parsing Expression Grammar based parser generator -
pest
(https://crates.io/crates/pest) provides a clean high-performance-parser generator; uses Parsing Expression Grammars. -
nom
(https://crates.io/crates/nom) Parser-combinator framework. Provides a large set of useful combinators, but can be a tad bit annoying to find the exact combinator you want. I've found it mostly usable, but I would emphatically suggest against implementing a full-blown Rust grammar in it. -
nom-peg
(https://crates.io/crates/nom-peg) Parsing Expression Grammar interface to nom -
lalrpop
(https://crates.io/crates/lalrpop) Parser generator using LR(1) or LALR(1) -
tree-sitter
(https://crates.io/crates/tree-sitter) Bindings to the Tree-Sitter parsing library. I've heard good things about Tree-Sitter (and it has a grammar for Rust already) but I don't know how extensible it is.
In regards to rust-analyzer
, my impression is that it is a very well-designed project with extensibility kept explicitly in mind, so hopefully the parser is also well designed for extensibility (and briefly looking around, I found it quite understandable and clean to work with). At a high-level, I feel that even if it might not be perfect for our use case, it might be better than having to write a full grammar from scratch if a Rust grammar doesn't exist in a parser-generator we choose. Of course, the above list is not exhaustive, and maybe someone else might find even more applicable libraries we can use.
rust-analyzer
is meant to be a good frontend for Rust, so it has good error messages. rustc
's parser definitely has good error messages (as you must've seen whenever writing rust). pest
has nice error messages. IIRC, tree-sitter
is heavily influenced by academic works in error recovery (and is designed explicitly to be part of the editor cycle), so it would very likely would have good error messages? nom
can have good messages, but is annoying to get it behaving nice for error reporting from personal experience. others: no idea.