Warning
This project is still in early development stage, the API may change frequently.
Text lexer and parser. Compiler frontend framework.
This can be used to fast prototype your own programming language compiler/translator frontend, or parse your domain specific language.
Try it online in the playground.
yarn add retsac
- The Lexer, yield token from the text input string.
- Regex support. See examples below.
- Built-in util functions.
- JavaScript's string literal, numeric literal, integer literal, identifier, etc.
- JSON's string literal, numeric literal.
- Support custom functions.
- The Parser, co-work with the lexer and produce an AST (Abstract Syntax Tree).
- ELR (Expectational LR) parser.
- Meta characters like
+*?
when defining a grammar rule. - Conflict detection, try to auto resolve conflicts.
- Query children nodes by using
$('name')
instead ofchildren[index]
. - Top-down traverse the AST.
- Bottom-up reduce data.
- Expect lexer to yield specific token type and/or content.
- Try to re-lex the input if parsing failed.
- DFA serialization & hydration to accelerate future building.
- Meta characters like
- Serializable AST object to co-work with other tools (e.g. compiler backend libs like LLVM).
- ELR (Expectational LR) parser.
- Strict type checking with TypeScript.
- This is amazing, you'd better try this out by yourself.
Documentation & API reference(deprecated, working on a new one).- A demo programming language which compiles to WebAssembly.
- Build tmLanguage.json file in TypeScript with
tmlb
. - Compose
RegExp
in JavaScript in a readable and maintainable way withr-compose
.
In this example, we use AdvancedBuilder
to define grammar rules with +*?
, define top-down traversers using traverser
, and query nodes in grammar rules using $
and $$
.
All conflicts are auto resolved.
Click to Expand
const lexer = new Lexer.Builder()
.ignore(Lexer.whitespaces()) // ignore blank characters
.define({
// built-in support for JSON
string: Lexer.json.stringLiteral(),
number: Lexer.json.numericLiteral(),
})
.define(Lexer.wordKind("true", "false", "null")) // token's kind name equals to the literal value
.anonymous(Lexer.exact(..."[]{},:")) // single char borders without a kind name
.build();
export const builder = new ELR.AdvancedBuilder({ lexer })
.data<unknown>()
.define(
{ value: `string | number | true | false | null` },
// eval the only child's text to get the value
(d) => d.traverser(({ children }) => eval(children[0].text!)),
)
.define(
{ value: `object | array` },
// call the only child's traverse method to get the object/array value
(d) => d.traverser(({ children }) => children[0].traverse()),
)
.define(
// `?` for zero or one, `*` for zero or more, use `()` to group
// quote literal values with `'` or `"`
{ array: `'[' (value (',' value)*)? ']'` },
// use `$$` to select all children with the given name
// traverse all values in the array and return the result as an array
(d) => d.traverser(({ $$ }) => $$(`value`).map((v) => v.traverse())),
)
.define(
// use `@` to rename a grammar
{ object_item: `string@key ':' value` },
(d) =>
// return an object
// use `$` to select the first child with the given name
d.traverser(({ $ }) => ({
// eval the key's value, then traverse child to get the entry's value
[eval($(`key`)!.text!)]: $(`value`)!.traverse(),
})),
)
.define({ object: `'{' (object_item (',' object_item)*)? '}'` }, (d) =>
d.traverser(({ $$ }) =>
// every object_item's traverse result is an object, we need to merge them.
Object.assign({}, ...$$(`object_item`).map((item) => item.traverse())),
),
);
In this example, we use reducer
to define bottom-up data reducers, so we can get the result when the AST is built.
There are conflicts introduced by those grammar rules, we use the high-level resolver API priority
to resolve them.
Click to Expand
const lexer = new Lexer.Builder()
.ignore(Lexer.whitespaces()) // ignore blank characters
.define({ number: /[0-9]+(?:\.[0-9]+)?/ })
.anonymous(Lexer.exact(..."+-*/()")) // operators
.build();
export const builder = new ELR.ParserBuilder({ lexer })
.data<number>()
.define({ exp: "number" }, (d) =>
// the result of the reducer will be stored in the node's value
d.reducer(({ matched }) => Number(matched[0].text)),
)
.define({ exp: `'-' exp` }, (d) => d.reducer(({ values }) => -values[1]!))
.define({ exp: `'(' exp ')'` }, (d) => d.reducer(({ values }) => values[1]))
.define({ exp: `exp '+' exp` }, (d) =>
d.reducer(({ values }) => values[0]! + values[2]!),
)
.define({ exp: `exp '-' exp` }, (d) =>
d.reducer(({ values }) => values[0]! - values[2]!),
)
.define({ exp: `exp '*' exp` }, (d) =>
d.reducer(({ values }) => values[0]! * values[2]!),
)
.define({ exp: `exp '/' exp` }, (d) =>
d.reducer(({ values }) => values[0]! / values[2]!),
)
.priority(
{ exp: `'-' exp` }, // highest priority
[{ exp: `exp '*' exp` }, { exp: `exp '/' exp` }],
[{ exp: `exp '+' exp` }, { exp: `exp '-' exp` }], // lowest priority
);
All issues and pull requests are highly welcomed.