-
Notifications
You must be signed in to change notification settings - Fork 34
Getting started
This getting-started implements a simple arithmetic parser. The parser will process simple text expressions containing sums of numbers. The working program returns the sum contained in the sample string.
For example, we can set up the rules for the parser: the string "1 + 2 + 3"
will be parsed to 6
. Likewise, the string " 3 "
will be parsed to 3
. Non-numeric characters will produce a lexical error. Trailing +
signs will be ignored.
Install from the NuGet gallery GUI or with the Package Manager Console using the following command:
Install-Package sly
or with dotnet core
dotnet add package sly
The first stage of a parser is the scanning step. From an input source the scanner extracts your language's tokens (words). In this example, our language will have a few simple tokens:
- numbers (we will stay with integers)
- the "+" operator
We also need to add a skippable lexeme whitespace (WS) to instruct the scanner to ignore certain characters that have no meaning in our lexicon. These patterns are also known as trivia. The whitespaces will not be reach the parser thus improving its performance. This is usually a good idea since it avoids cluttering the parser with trivia.
Csly encodes token definitions in a C# enum
type annotated with custom attributes. Each token is annotated with a regular expression that matches it. Here is the full lexer:
public enum ExpressionToken {
[Lexeme("[0-9]+")]
INT = 1,
[Lexeme("\\+")]
PLUS = 2,
[Lexeme("[ \\t]+", isSkippable:true)] // the lexeme is marked isSkippable : it will not be sent to the parser and simply discarded.
WS = 3
}
our grammar in BNF notation is quite simple:
expression : INT
expression : term PLUS expression
term : INT
⚠️ Note that this grammar is right recursive so will generate right associative operations.
Csly uses BNF notation and attaches visitor methods to each rule. The methods are "visitors" in the sense that they will be used to traverse the syntax tree generated by csly during the parse operation. Visitor methods have the following properties:
- the return type is the type of the parse result:
int
for our language, - parameters of the visitor match each clause of the right-hand side of a rule.
⚠️ Read carefully the Defining your parser section to correctly implement your expression parser (visitor) methods
here is the visitor method for the first rule:
[Production("expression: INT")]
public int Primary(Token<ExpressionToken> intToken)
{
return intToken.IntValue;
}
The whole parser is then
public class ExpressionParser
{
[Production("expression: INT")]
public int intExpr(Token<ExpressionToken> intToken)
{
return intToken.IntValue;
}
[Production("expression: term PLUS expression")]
public int Expression(int left, Token<ExpressionToken> operatorToken, int right) {
return left + right;
}
[Production("term: INT")]
public int Expression(Token<ExpressionToken> intToken) {
return intToken.IntValue;
}
}
using sly.parser;
using sly.parser.generator;
public class SomeClass {
public static Parser<ExpressionToken,int> GetParser() {
var parserInstance = new ExpressionParser();
var builder = new ParserBuilder<ExpressionToken, int>();
var Parser = builder.BuildParser(parserInstance, ParserType.LL_RECURSIVE_DESCENT, "expression").Result;
return Parser;
}
}
public class SomeTest {
public void TestCSLY() {
string expression = "42 + 42";
var Parser = SomeClass.GetParser();
var r = Parser.Parse(expression);
if (!r.IsError)
{
Console.WriteLine($"result of <{expression}> is {(int)r.Result}");
// outputs : result of <42 + 42> is 84"
}
else
{
if (r.Errors != null && r.Errors.Any())
{
// display errors
r.Errors.ForEach(error => Console.WriteLine(error.ErrorMessage));
}
}
}
}
Next steps:
- lexers
- regex base lexer : slow but fully customizable
- generic lexer : fast and matches almost every lexing use case
- parsers:
Home ⬅️ Getting started ➡️ Defining your parser