Skip to content
/crackono.sh edited this page Dec 8, 2020 · 5 revisions

Syntax analysis (parser)

About

Second most important part of our compiler is parser. Parser takes token stream from scanner and, depending on rules specified in LL table (see LL_table.png), transforms them into Abstract Syntax Tree (AST), which, in our case, is Binary Search Tree (BST). In parser.c this is implemented as various functions which acts as LL rules. If token stream taken from parser has invalid syntax, parser stops it's action and returns 2, which indicates error in syntax analysis, otherwise performs semantic analysis and it also can call precedent_analyse function from precedent.h.

Parsing using precedent syntax analysis

Because some expressions contain arithmetic and logical operators, we have to check priority for those operators and evaluate correct order in which we perform those operations. Priority for those operators are specified by precedent table (see TODO). Precedent syntax analysis is not part of parser files but is implemented in standalone precedent files. At start it takes token from parser and then, one by one, gets next tokens and determines what should do next

Functions

  1. Parser - Main function of compiler, prepares symtable and stack for further usage, then calls program
  2. Program - Strips comments and EOLs and then calls prolog, eolM and functionsBlock
  3. Prolog - Checks whether file starts with package main
  4. EolM - Checks whether prologue is ended with EOL and then calls eolR
  5. EolR - Strips EOLs
  6. FunctionsBlock - Checks that token is KEYWORD func and calls function and functionNext, after that checks that main function was found only once
  7. Function - Checks that next token is IDENTIFIER, if so then it saves it's name to functions symtable. Then it looks for ( and calls arguments func. If arguments was successful and current token is ) it calls functionReturn and commandBlock
  8. FunctionNext - Checks if there are any other functions, if so, calls function and functionNext
  9. Arguments - Checks whether token is IDENTIFIER, if so calls type and argumentNext
  10. Type - Checks whether token equals to int, float64, string or bool
  11. ArgumentNext - If token is comma, then perform action as arguments function
  12. FunctionReturn - If token is ( calls functionReturnType, otherwise return 0, because return () could be omitted.
  13. FunctionReturnType - If token equals to KEYWORD, calls type and functionReturnTypeNext and then looks for ), if it equals to ), continue with program
  14. FunctionReturnTypeNext - Checks that token is COMMA, another token should be KEYWORD and calls type and functionReturnTypeNext
  15. CommandBlock - Checks that token is {, then gets another which should be EOL, if so strips remaining EOLs. If next token does not equal to }, calls commands and then checks that token equals to }, then look for one required EOL and strip remaining EOLs
  16. Commands - Calls command, checks that command is ended with EOL, strips remaining EOLs and then determines whether token equals to }, if so returns result, otherwise calls commands
  17. Command - Contains switch statement which determines next action. If token equals to IDENTIFIER calls statement. If token equals to KEYWORD go to another switch statement determining what kind of action we should perform based on passed KEYWORD:
    • IF - call commandBlock and ifElse functions.
    • FOR - call forDefine and then checks that token equals to ;
    • RETURN - call returnCommand
    • default - invalid KEYWORD passed
  18. Statement - Contains another switch statement which distinguishes between those token types:
    • ( - calls arguments function and checks that token equals to )
    • = - calls assignment function
    • := - calls assignment function
    • +=, -=, *=, /= - calls unary function
    • default - calls multipleID, then checks that token equals to = and calls assignment
  19. MultipleID - Checks that token equals to COMMA, increases number of IDs, then checks that next token is IDENTIFIER and calls multipleID
  20. Assignment - Contains switch that distinguishes between those token types:
    • IDENTIFIER - checks that next token is (, calls arguments and checks that next token is )
    • default - calls expressionNext function
  21. Unary - Only checks that token is unary type
  22. ExpressionNext - Checks whether token equals to COMMA, then checks that number of IDs - 1 is greater than 0, then calls expresssionNext function
  23. IfElse - If token is KEYWORD and equals to ELSE then it calls ifElseExpanded function
  24. IfElseExpanded - If token equals to IF KEYWORD then calls commandBlock and ifElse functions, otherwise calls commandBlock only
  25. ForDefine - Checks that token's type equals to IDENTIFIER, if so checks whether next token equals to :=
  26. ForAssign - Determines whether token is IDENTIFIER and next token is =
  27. ReturnCommand - Checks that token is KEYWORD RETURN and calls returnStatement
  28. ReturnStatement - todo

Usage

The only function which program should call is program(), which calls other functions as it progresses through token stream (see Functions 1.). If no syntax error is found while parsing tokens, pass stack of symtable to semantic analyzer which performs own checks on it. Parser is able to return various codes:

  • 0 - if everything was successful
  • 1 - if lexical error
  • 2 - if syntax error
  • 3 - if semantic error in program (undefined function, variable, ...)
  • 4 - if semantic error in type assignment to new variable
  • 5 - if semantic error in type compatibility (arithmetics, ...)
  • 6 - if semantic error in program (invalid number of params or return values)
  • 7 - if other semantic error
  • 9 - if zero division
  • 99 - if internal error
  • -1 - if internal warning

IFJ20

Menu

Authors

Clone this wiki locally