Skip to content

DiscreteTom/retsac

Repository files navigation

Retsac

npm coverage build license

Warning

This project is still in early development stage, the API may change frequently.

Text lexer and parser. Compiler frontend framework.

This can be used to fast prototype your own programming language compiler/translator frontend, or parse your domain specific language.

Try it online in the playground.

Installation

yarn add retsac

Features

  • The Lexer, yield token from the text input string.
    • Regex support. See examples below.
    • Built-in util functions.
      • JavaScript's string literal, numeric literal, integer literal, identifier, etc.
      • JSON's string literal, numeric literal.
    • Support custom functions.
  • The Parser, co-work with the lexer and produce an AST (Abstract Syntax Tree).
    • ELR (Expectational LR) parser.
      • Meta characters like +*? when defining a grammar rule.
      • Conflict detection, try to auto resolve conflicts.
      • Query children nodes by using $('name') instead of children[index].
      • Top-down traverse the AST.
      • Bottom-up reduce data.
      • Expect lexer to yield specific token type and/or content.
      • Try to re-lex the input if parsing failed.
      • DFA serialization & hydration to accelerate future building.
    • Serializable AST object to co-work with other tools (e.g. compiler backend libs like LLVM).
  • Strict type checking with TypeScript.
    • This is amazing, you'd better try this out by yourself.

Resources

In this example, we use AdvancedBuilder to define grammar rules with +*?, define top-down traversers using traverser, and query nodes in grammar rules using $ and $$.

All conflicts are auto resolved.

Click to Expand
const lexer = new Lexer.Builder()
  .ignore(Lexer.whitespaces()) // ignore blank characters
  .define({
    // built-in support for JSON
    string: Lexer.json.stringLiteral(),
    number: Lexer.json.numericLiteral(),
  })
  .define(Lexer.wordKind("true", "false", "null")) // token's kind name equals to the literal value
  .anonymous(Lexer.exact(..."[]{},:")) // single char borders without a kind name
  .build();

export const builder = new ELR.AdvancedBuilder({ lexer })
  .data<unknown>()
  .define(
    { value: `string | number | true | false | null` },
    // eval the only child's text to get the value
    (d) => d.traverser(({ children }) => eval(children[0].text!)),
  )
  .define(
    { value: `object | array` },
    // call the only child's traverse method to get the object/array value
    (d) => d.traverser(({ children }) => children[0].traverse()),
  )
  .define(
    // `?` for zero or one, `*` for zero or more, use `()` to group
    // quote literal values with `'` or `"`
    { array: `'[' (value (',' value)*)? ']'` },
    // use `$$` to select all children with the given name
    // traverse all values in the array and return the result as an array
    (d) => d.traverser(({ $$ }) => $$(`value`).map((v) => v.traverse())),
  )
  .define(
    // use `@` to rename a grammar
    { object_item: `string@key ':' value` },
    (d) =>
      // return an object
      // use `$` to select the first child with the given name
      d.traverser(({ $ }) => ({
        // eval the key's value, then traverse child to get the entry's value
        [eval($(`key`)!.text!)]: $(`value`)!.traverse(),
      })),
  )
  .define({ object: `'{' (object_item (',' object_item)*)? '}'` }, (d) =>
    d.traverser(({ $$ }) =>
      // every object_item's traverse result is an object, we need to merge them.
      Object.assign({}, ...$$(`object_item`).map((item) => item.traverse())),
    ),
  );

In this example, we use reducer to define bottom-up data reducers, so we can get the result when the AST is built.

There are conflicts introduced by those grammar rules, we use the high-level resolver API priority to resolve them.

Click to Expand
const lexer = new Lexer.Builder()
  .ignore(Lexer.whitespaces()) // ignore blank characters
  .define({ number: /[0-9]+(?:\.[0-9]+)?/ })
  .anonymous(Lexer.exact(..."+-*/()")) // operators
  .build();

export const builder = new ELR.ParserBuilder({ lexer })
  .data<number>()
  .define({ exp: "number" }, (d) =>
    // the result of the reducer will be stored in the node's value
    d.reducer(({ matched }) => Number(matched[0].text)),
  )
  .define({ exp: `'-' exp` }, (d) => d.reducer(({ values }) => -values[1]!))
  .define({ exp: `'(' exp ')'` }, (d) => d.reducer(({ values }) => values[1]))
  .define({ exp: `exp '+' exp` }, (d) =>
    d.reducer(({ values }) => values[0]! + values[2]!),
  )
  .define({ exp: `exp '-' exp` }, (d) =>
    d.reducer(({ values }) => values[0]! - values[2]!),
  )
  .define({ exp: `exp '*' exp` }, (d) =>
    d.reducer(({ values }) => values[0]! * values[2]!),
  )
  .define({ exp: `exp '/' exp` }, (d) =>
    d.reducer(({ values }) => values[0]! / values[2]!),
  )
  .priority(
    { exp: `'-' exp` }, // highest priority
    [{ exp: `exp '*' exp` }, { exp: `exp '/' exp` }],
    [{ exp: `exp '+' exp` }, { exp: `exp '-' exp` }], // lowest priority
  );

Contribute

All issues and pull requests are highly welcomed.