-
Notifications
You must be signed in to change notification settings - Fork 0
How the parser works
This is our problem: We need a parser that takes Toki Pona sentence and makes AST out of it. The parser must take account all the possibilities of how the sentence is going to be parsed: That "tawa" could be a preposition, or maybe it could be a modifier.
There are many ways to approach this problem. One problem we have is we can't use external libraries, many parsing libraries only takes account of making just one AST, we need more than one. So we can only handwrite our own parser or handwrite a library and use it. I've chosen the latter. We need some flexibilities that the libraries provides.
So, what kind of library? I've opted to making parser combinator, a very nice way to compose parser.
Note: If you know what parser combinator is, you may skip this part.
Parser combinator libraries provides parser combinator for composing parsers. Before explaining what parser combinator is, we need to define what parsers is, as parser combinator can only take certain kind of parsers.
Parsers are function that takes string (or array of anything or some kind of reader) and outputs what it just parsed as well as the rest of the string, leaving out what it have taken. Parsers can be small like it can only parse words or numbers or it can be big as well like it can parse the whole sentence.
It is best explained with black box analogy.
Parsers can output error, which is essential, but we'll ignore it for now.
Parser combinator are functions that... get this... takes parser, could be as many parser as well as other data, and outputs a parser! It takes functions and outputs a function! It's so meta! This is something that you would expect in functional programming. Parser combinator is very much a product functional programming paradigm.
Parser combinator is used to compose parsers. If we have a parser for word and a parser for number. We could make a parser for a sequence of word and number with these two parsers as well as a combinator called "sequence combinator"
This allows us to compose parser, starting from small parsers and gradually moving towards larger parsers in a very declarative way.
In the context of parser combinator, it's best not to think of parsers as functions but rather its own unique thing. Like how we think of 4 as 4 itself instead of "4 tomatoes" "4 coins" etc. Parsers are its own thing.
Note: If you know what this is, you may skip this part.
At this time, I recommend taking a look at the code. Don't worry, it won't bite.
We have:
-
sequence
combinator -
choice
combinator -
map
combinator
There are more combinators but these are essential combinator that you need to know.
The sequence
combinator, well you have seen it before, it takes multiple parsers then outputs a parser that parses a sequence of what those input parsers parses.
TODO: put example here
The choice
combinator takes multiple parsers then output a parser that tries them one by one. If one fails to parse, then it will try the next one until one parses. This is useful if we want to parse things that can take multiple forms. For example, if we're parsing a modifier, we could parse a toki pona word or a capitalized proper word.
TODO: put example here
The map
combinator will take a single parser and a single mapper function. Remember the sequence
combinator? Have you wondered what the parsing result would be if we used that combinator? It would be a tuple, if the sequence
combinator took two parsers, the result would be a two-length tuple, and so on. What if we want a different result? That's where the map
combinator is useful at! It maps the parsing result.
The map
combinator is defined as a method of the Parser
class just so it's nicer to use.
TODO: put example here
We have introduced a very huge topic. Let's not forget why we're making parser combinator. We need a parser library that can output multiple possible AST's. Remember that Toki Pona sentences can have vague constructions.
Typical parser combinator libraries only takes account of having one AST. But we can modify our own.
One difference we're going to make is that parsers will now output multiple data. Each containing what it just parsed as well as the rest of the string. See also: The Output data type.
This also makes the combinator a lot more complicated. Lucky, once we're done with them, we don't have to look at it once again. That's the beauty of abstraction.
Although, since this is an explanation page, we're going to look at it once again.
Let's not forget, the parser has to output multiple result. So we need to take account of this when defining the parser and combinators.
TODO