Skip to content

How the parser works

Koko edited this page Jan 14, 2024 · 26 revisions

This is our problem: We need a parser that takes Toki Pona sentence and makes AST out of it. The parser must take account all the possibilities of how the sentence is going to be parsed: That "tawa" could be a preposition, or maybe it could be a modifier.

There are many ways to approach this problem. One problem we have is we can't use external libraries, many parsing libraries only takes account of making just one AST, we need more than one. So we can only handwrite our own parser or handwrite a library and use it. I've chosen the latter. We need some flexibilities that the libraries provides.

So, what kind of library? I've opted to making parser combinator, a very nice way to compose parser.

What is parser combinator?

Note: If you know what parser combinator is, you may skip this part.

Parser combinator libraries provides parser combinator for composing parsers. Before explaining what parser combinator is, we need to define what parsers is, as parser combinator can only take certain kind of parsers.

Parsers are function that takes string (or array of anything or some kind of reader) and outputs what it just parsed as well as the rest of the string, leaving out what it have taken. Parsers can be small like it can only parse words or numbers or it can be big as well like it can parse the whole sentence.

It is best explained with black box analogy.

Parsers can output error, which is essential, but we'll ignore it for now.

Parser combinator are functions that... get this... takes parser, could be as many parser as well as other data, and outputs a parser! It takes functions and outputs a function! It's so meta! This is something that you would expect in functional programming. Parser combinator is very much a product functional programming paradigm.

Parser combinator is used to compose parsers. If we have a parser for word and a parser for number. We could make a parser for a sequence of word and number with these two parsers as well as a combinator called "sequence combinator"

This allows us to compose parser, starting from small parsers and gradually moving towards larger parsers in a very declarative way.

Woah woah woah, I'm confused now

In the context of parser combinator, it's best not to think of parsers as functions but rather its own unique thing. Like how we think of 4 as 4 itself instead of "4 tomatoes" "4 coins" etc. Parsers are its own thing.

So many possibilities!

We have introduced a very huge topic. Let's not forget why we're making parser combinator. We need a parser library that can output multiple possible AST's. Remember that Toki Pona sentences can have vague constructions.

Typical parser combinator libraries only takes account of having one AST. But we can modify our own.

One difference we're going to make is that parsers will now output multiple data. Each containing what it just parsed as well as the rest of the string. See also: The Output data type.

This also makes the combinator a lot more complicated. Lucky, once we're done with them, we don't have to look at it once again. That's the beauty of abstraction.

Although, since this is an explanation page, we're going to look at it once again.

The insides of combinator

At this time, I recommend taking a look at the code. Don't worry, it won't bite.

TODO

Clone this wiki locally