Parser

Syntax analysis (parser)

About

Second most important part of our compiler is parser. Parser takes token stream from scanner and, depending on rules specified in LL table (see LL_table.png), transforms them into Abstract Syntax Tree (AST), which, in our case, is Binary Search Tree (BST). In parser.c this is implemented as various functions which acts as LL rules. If token stream taken from parser has invalid syntax, parser stops it's action and returns 2, which indicates error in syntax analysis, otherwise performs semantic analysis and it also can call precedent_analyse function from precedent.h.

Parsing using precedent syntax analysis

Because some expressions contain arithmetic and logical operators, we have to check priority for those operators and evaluate correct order in which we perform those operations. Priority for those operators are specified by precedent table (see TODO). Precedent syntax analysis is not part of parser files but is implemented in standalone precedent files. At start it takes token from parser and then, one by one, gets next tokens and determines what should do next

Functions

Parser - Main function of compiler, prepares symtable and stack for further usage, then calls program
Program - Strips comments and EOLs and then calls prolog, eolM and functionsBlock
Prolog - Checks whether file starts with package main
EolM - Checks whether prologue is ended with EOL and then calls eolR
EolR - Strips EOLs
FunctionsBlock - Checks that token is KEYWORD func and calls function and functionNext, after that checks that main function was found only once
Function - Checks that next token is IDENTIFIER, if so then it saves it's name to functions symtable. Then it looks for ( and calls arguments func. If arguments was successful and current token is ) it calls functionReturn and commandBlock
FunctionNext - Checks if there are any other functions, if so, calls function and functionNext
Arguments - Checks whether token is IDENTIFIER, if so calls type and argumentNext
Type - Checks whether token equals to int, float64, string or bool
ArgumentNext - If token is comma, then perform action as arguments function
FunctionReturn - If token is ( calls functionReturnType, otherwise return 0, because return () could be omitted.
FunctionReturnType - If token equals to KEYWORD, calls type and functionReturnTypeNext and then looks for ), if it equals to ), continue with program
FunctionReturnTypeNext - Checks that token is COMMA, another token should be KEYWORD and calls type and functionReturnTypeNext
CommandBlock - Checks that token is {, then gets another which should be EOL, if so strips remaining EOLs. If next token does not equal to }, calls commands and then checks that token equals to }, then look for one required EOL and strip remaining EOLs
Commands - Calls command, checks that command is ended with EOL, strips remaining EOLs and then determines whether token equals to }, if so returns result, otherwise calls commands
Command - Contains switch statement which determines next action. If token equals to IDENTIFIER calls statement. If token equals to KEYWORD go to another switch statement determining what kind of action we should perform based on passed KEYWORD:
- IF - call commandBlock and ifElse functions.
- FOR - call forDefine and then checks that token equals to ;
- RETURN - call returnCommand
- default - invalid KEYWORD passed
Statement - Contains another switch statement which distinguishes between those token types:
- ( - calls arguments function and checks that token equals to )
- = - calls assignment function
- := - calls assignment function
- +=, -=, *=, /= - calls unary function
- default - calls multipleID, then checks that token equals to = and calls assignment
MultipleID - Checks that token equals to COMMA, increases number of IDs, then checks that next token is IDENTIFIER and calls multipleID
Assignment - Contains switch that distinguishes between those token types:
- IDENTIFIER - checks that next token is (, calls arguments and checks that next token is )
- default - calls expressionNext function
Unary - Only checks that token is unary type
ExpressionNext - Checks whether token equals to COMMA, then checks that number of IDs - 1 is greater than 0, then calls expresssionNext function
IfElse - If token is KEYWORD and equals to ELSE then it calls ifElseExpanded function
IfElseExpanded - If token equals to IF KEYWORD then calls commandBlock and ifElse functions, otherwise calls commandBlock only
ForDefine - Checks that token's type equals to IDENTIFIER, if so checks whether next token equals to :=
ForAssign - Determines whether token is IDENTIFIER and next token is =
ReturnCommand - Checks that token is KEYWORD RETURN and calls returnStatement
ReturnStatement - todo

Usage

The only function which program should call is program(), which calls other functions as it progresses through token stream (see Functions 1.). If no syntax error is found while parsing tokens, pass stack of symtable to semantic analyzer which performs own checks on it. Parser is able to return various codes:

0 - if everything was successful
1 - if lexical error
2 - if syntax error
3 - if semantic error in program (undefined function, variable, ...)
4 - if semantic error in type assignment to new variable
5 - if semantic error in type compatibility (arithmetics, ...)
6 - if semantic error in program (invalid number of params or return values)
7 - if other semantic error
9 - if zero division
99 - if internal error
-1 - if internal warning

IFJ20

Authors

Dominik Horky
- @horsecz ([email protected])
Roman Janiczek
- @theleteron ([email protected])
Lukas Hais
- @crackonosh ([email protected])
Jan Pospisil
- @zelick0 ([email protected])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly