-
Notifications
You must be signed in to change notification settings - Fork 1
AST Specification
M2J encodes valid Modula-2 input in memory as an abstract syntax tree (AST).
The AST uses three kinds of nodes.
- an
EMPTY
sentinel node - non-terminal nodes
- terminal nodes
The sentinel node EMPTY
,
is used to encode the absence of an optional sub-node in non-terminal nodes.
Non-terminal nodes are used to encode non-terminal symbols such as modules, imports, definitions, declarations, statements and expressions.
AST
,
DEFMOD
,
IMPLIST
,
IMPORT
,
UNQIMP
,
DEFLIST
,
CONSTDEF
,
TYPEDEF
,
PROCDEF
,
SUBR
,
ENUM
,
SET
,
ARRAY
,
RECORD
,
EXTREC
,
VRNTREC
,
POINTER
,
PROCTYPE
,
INDEXLIST
,
FIELDLISTSEQ
,
FIELDLIST
,
VFLISTSEQ
,
VFLIST
,
VARIANTLIST
,
VARIANT
,
CLABELLIST
,
CLABELS
,
FTYPELIST
,
OPENARRAY
,
CONSTP
,
VARP
,
FPARAMLIST
,
FPARAMS
,
IMPMOD
,
BLOCK
,
DECLLIST
,
TYPEDECL
,
VARDECL
,
PROC
,
MODDECL
,
VSR
,
VSFIELD
,
EXPORT
,
QUALEXP
,
STMTSEQ
,
ASSIGN
,
PCALL
,
RETURN
,
WITH
,
IF
,
SWITCH
,
LOOP
,
WHILE
,
REPEAT
,
FORTO
,
EXIT
,
ARGS
,
ELSIFSEQ
,
ELSIF
,
CASELIST
,
CASE
,
ELEMLIST
,
RANGE
,
FIELD
,
INDEX
,
DESIG
,
DEREF
,
NOT
,
AND
,
OR
,
EQ
,
NEQ
,
LT
,
LTEQ
,
GT
,
GTEQ
,
IN
,
NEG
,
PLUS
,
MINUS
,
STAR
,
SLASH
,
DIV
,
MOD
,
SETDIFF
,
FCALL
,
SETVAL
.
Terminal nodes are used to encode terminal symbols such as filenames, options, identifiers, integer literals, real number literals, character code and quoted literals.
FILENAME
,
OPTIONS
,
IDENT
,
IDENTLIST
,
QUALIDENT
,
INTVAL
,
REALVAL
,
CHRVAL
,
QUOTEDVAL
.
AST nodes are represented graphically as follows:
Nodes are represented by rectangular boxes. Arrows indicate subnode connections.
The subnodes of terminal nodes are filled in grey.
Labels are represented by rounded boxes.
Labels with multiple connections represent alternatives.
Any AST node may be represented in a serialised format of the form
( nodetype subnode-0 subnode-1 subnode-2 ... subnode-N )
where the actual number of sub-nodes is dependent on the node type.
This form of tree representation is called an S-expression.
The structure of the common AST is described in detail below.
The EMPTY
node encodes the absence of an optional sub-node.
emptyNode :=
'(' EMPTY ')'
;
The AST
node encodes the root of the syntax tree.
astRootNode :=
'(' AST filename options compilationUnit ')'
;
filename := filenameNode ; /* terminal node */
options := optionsNode ; /* terminal node */
compilationUnit :=
defModuleNode | impModuleNode
;
The DEFMOD
node encodes a definition module.
defModuleNode :=
'(' DEFMOD moduleIdent importList definitionList ')'
;
moduleIdent := identNode ; /* terminal node */
importList :=
importListNode | emptyNode
;
definitionList :=
definitionListNode | emptyNode
;
The IMPLIST
node encodes the entirety of import directives in a module.
importListNode :=
'(' IMPLIST import+ ')'
;
There are two types of import nodes:
import :=
importNode | unqImportNode
;
The IMPORT
node encodes a qualified import directive.
importNode :=
'(' IMPORT identList ')'
;
identList := identListNode ; /* terminal node */
The UNQIMP
node encodes an unqualified import directive.
unqImportNode :=
'(' UNQIMP moduleIdent identList ')'
;
The DEFLIST
node encodes one or more definitions.
definitionListNode :=
'(' DEFLIST definition+ ')'
;
There are four types of definition nodes:
CONSTDEF
,
TYPEDEF
,
VARDECL
,
PROCDEF
.
definition :=
constDefNode | typeDefNode | varDeclNode | ProcDefNode
;
The CONSTDEF
node encodes a constant definition.
constDefNode :=
'(' CONSTDEF identNode exprNode ')'
;
The TYPEDEF
node encodes a type definition.
typeDefNode :=
'(' TYPEDEF identNode ( typeNode | emptyNode ) ')'
;
The PROCDEF
node encodes a procedure definition.
procDefNode :=
'(' PROCDEF identNode formalParamList returnedType ')'
;
formalParamList :=
formalParamListNode | emptyNode
;
returnedType := qualidentNode ; /* terminal node */
There are eleven nodes that may represent the definition part of a type definition or an anonymous type.
IDENT
,
QUALIDENT
,
SUBR
,
ENUM
,
SET
,
ARRAY
,
RECORD
,
EXTREC
,
VRNTREC
,
POINTER
,
PROCTYPE
.
typeNode :=
identNode | qualidentNode |
subrTypeNode | enumTypeNode | setTypeNode | arrayTypeNode | recTypeNode |
extRecTypeNode | vrntRecTypeNode | pointerTypeNode | procTypeNode
;
Nine nodes may represent the type definition part of a field definition or an array base type.
IDENT
,
QUALIDENT
,
SUBR
,
ENUM
,
SET
,
ARRAY
,
RECORD
,
POINTER
,
PROCTYPE
.
fieldType :=
identNode | qualidentNode | subrTypeNode | enumTypeNode | setTypeNode |
arrayTypeNode | recTypeNode | pointerTypeNode | procTypeNode
;
Three nodes may represent the type definition part of a derived type.
derivedType :=
identNode | qualidentNode | subrTypeNode
;
Three nodes represent the type definition part of a record type.
recordType :=
recTypeNode | extRecTypeNode | vrntRecTypeNode
;
The SUBR
node encodes a subrange type definition.
subrTypeNode :=
'(' SUBR lowerBound upperBound subrBaseType ')'
;
lowerBound := exprNode ;
upperBound := exprNode ;
subrBaseType :=
qualidentNode | emptyNode
;
The ENUM
node encodes an enumeration type definition.
enumTypeNode :=
'(' ENUM identListNode ')'
;
The SET
node encodes a set type definition.
setTypeNode :=
'(' SET countableType ')'
;
countableType :=
qualidentNode | subrTypeNode | enumTypeNode
;
The ARRAY
node encodes an array type definition.
arrayTypeNode :=
'(' ARRAY indexTypeListNode arrayBaseType ')'
;
arrayBaseType := fieldType ;
The RECORD
node encodes a non-variant non-extensible record type definition.
recTypeNode :=
'(' RECORD fieldListSeqNode ')'
;
The POINTER
node encodes a pointer type definition.
pointerTypeNode :=
'(' POINTER typeNode ')'
;
The PROCTYPE
node encodes a procedure type definition.
procTypeNode :=
'(' PROCTYPE formalTypeList returnedType ')'
;
formalTypeList :=
formalTypeListNode | emptyNode
;
returnedType :=
qualidentNode | emptyNode
;
The EXTREC
node encodes an extensible record type definition.
extRecTypeNode :=
'(' EXTREC recBaseType fieldListSeqNode ')'
;
recBaseType := qualidentNode ; /* terminal node */
The VRNTREC
node encodes a variant record type definition.
variantRecTypeNode :=
'(' VRNTREC variantFieldListSeqNode ')'
;
The INDEXLIST
node encodes one or more index types within an array type definition.
indexTypeListNode :=
'(' INDEXLIST indexType+ ')'
;
indexType := countableType ;
The FIELDLISTSEQ
node encodes a non-variant field list sequence within a record type definition.
fieldListSeqNode :=
'(' FIELDLISTSEQ fieldListNode+ ')'
;
The FIELDLIST
node encodes a non-variant field list within a field list sequence
fieldListNode :=
'(' FIELDLIST identListNode fieldType ')'
;
The VFLISTSEQ
node encodes a field list sequence within a variant record type definition.
variantFieldListSeqNode :=
'(' VFLISTSEQ ( fieldListNode | variantFieldListNode )+ ')'
;
The VFLIST
node encodes a variant field list within a variant record field list sequence.
variantFieldListNode :=
'(' VFLIST caseIdent caseType variantList defaultFieldListSeq ')'
;
caseIdent :=
identNode | emptyNode
;
caseType := qualidentNode ; /* terminal node */
defaultFieldListSeq :=
fieldListSeqNode | emptyNode
;
The VARIANTLIST
node encodes a variant list within a variant field list.
variantList :=
'(' VARIANTLIST variantNode+ ')'
;
The VARIANT
node encodes a variant within a variant list.
variantNode :=
'(' VARIANT caseLabelListNode fieldListSeqNode ')'
;
The CLABELLIST
node encodes a case label list within a variant record definition or case statement.
caseLabelListNode :=
'(' CLABELLIST caseLabelsNode+ ')
;
The CLABELS
node encodes start and end labels within a case label list.
caseLabelsNode :=
'(' CLABELS startLabel endLabel ')'
;
startLabel := exprNode ;
endLabel :=
exprNode | emptyNode
;
The FTYPELIST
node encodes a formal type list within a procedure type definition.
formalTypeListNode :
'(' FTYPELIST formalType+ ')'
;
formalType :=
simpleFormalType | attrFormalType
;
simpleFormalType :=
typeIdent | openArrayTypeNode
;
attrFormalType :=
constAttrFormalTypeNode | varAttrFormalTypeNode
;
The OPENARRAY
node encodes an open array parameter within a formal type list.
openArrayTypeNode :=
'(' OPENARRAY typeIdent ')'
;
typeIdent := qualidentNode ; /* terminal node */
The CONSTP
node encodes a CONST
parameter within a formal type list.
constAttrFormalTypeNode :=
'(' CONSTP simpleFormalType ')'
;
The VARP
node encodes a VAR
parameter within a formal type list.
varAttrFormalTypeNode :=
'(' VARP simpleFormalType ')'
;
The FPARAMLIST
node encodes a formal parameter list within a procedure type definition or procedure signature.
formalParamListNode :=
'(' FPARAMLIST formalParamsNode+ ')'
;
The FPARAMS
node encodes formal parameters within a formal parameter list.
formalParamsNode :=
'(' FPARAMS identListNode formalTypeNode ')'
;
The IMPMOD
node encodes an implementation module.
impModuleNode :=
'(' IMPMOD moduleIdent importListNode blockNode ')'
;
The BLOCK
node encodes a block within a module or procedure.
blockNode :=
'(' BLOCK declarationList body ')'
;
declarationList :=
declarationListNode | emptyNode
;
body :=
statementSeqNode | emptyNode
;
The DECLLIST
node encodes one or more declarations within a block.
declarationListNode :=
'(' DECLLIST declarationNode+ ')'
;
There are five types of declaration nodes:
CONSTDEF
,
TYPEDECL
,
VARDECL
,
PROC
,
MODDECL
.
declarationNode :=
constDefNode | typeDeclNode | varDeclNode | procDeclNode | modDeclNode
;
The TYPEDECL
node encodes a type declaration.
typeDeclNode :=
'(' TYPEDECL identNode ( typeNode | vsrTypeNode ) ')'
;
The VARDECL
node encodes a variable or field declaration.
varDeclNode :=
'(' VARDECL identListNode fieldType ')'
;
The PROC
node encodes a procedure declaration.
procDeclNode :=
'(' PROC identNode formalParamListNode returnedType blockNode ')'
;
The MODDECL
node encodes a local module declaration.
modDeclNode :=
'(' MODDECL moduleIdent importListNode exportList blockNode ')'
;
exportList :=
unqualExportNode | qualExportNode | emptyNode
;
The VSR
node encodes a variable size record type declaration.
vsrTypeNode :=
'(' VSR fieldListSeqNode varSizeFieldNode ')'
;
The VSFIELD
node encodes the indeterminate field of a variable size record type.
varSizeFieldNode :=
'(' VSFIELD varSizeField determinantField varSizeFieldType ')'
;
varSizeField := identNode ; /* terminal node */
determinantField := identNode ; /* terminal node */
varSizeFieldType := qualidentNode ; /* terminal node */
The EXPORT
node encodes an unqualified export directive within a local module declaration.
unqualExportNode :=
'(' EXPORT identListNode ')'
;
The QUALEXP
node encodes a qualified export directive within a local module declaration.
qualExportNode :=
'(' QUALEXP identListNode ')'
;
The STMTSEQ
node encodes a statement sequence.
statementSeqNode :=
'(' STMTSEQ statementNode+ ')'
;
There are eleven types of statement nodes:
ASSIGN
,
PCALL
,
RETURN
,
WITH
,
IF
,
SWITCH
,
LOOP
,
WHILE
,
REPEAT
,
FORTO
,
EXIT
.
statementNode :=
assignmentNode | controlStmt | withStmtNode
;
Nine statement nodes represent control statements:
PCALL
,
RETURN
,
IF
,
SWITCH
,
LOOP
,
WHILE
,
REPEAT
,
FORTO
,
EXIT
.
controlStmt :=
procCallNode | returnStmtNode | ifStmtNode | caseStmtNode | loopCtrlStmt
;
Five statement nodes represent loop control statements:
LOOP
,
WHILE
,
REPEAT
,
FORTO
,
EXIT
.
loopCtrlStatement :=
loopStmtNode | whileStmtNode | repeatStmtNode | ForToStmtNode | exitStmtNode
;
The ASSIGN
node encodes an assignment statement.
assignmentNode :=
'(' ASSIGN designator exprNode ')'
;
designator :=
identNode | qualidentNode | derefNode | designatorNode
;
The PCALL
node encodes a procedure call statement.
procCallNode :=
'( PCALL designator actualParams ')'
;
actualParams :=
actualParamsNode | emptyNode
;
The RETURN
node encodes a RETURN
statement.
returnStmtNode :=
'(' RETURN returnValue ')'
;
returnValue :=
exprNode | emptyNode
;
The WITH
node encodes a WITH
statement.
withStmtNode :=
'(' WITH designator statementSeqNode ')'
;
The IF
node encodes an IF
statement.
ifStmtNode :=
'(' IF exprNode ifBranch elsifSeq elseBranch ')'
;
ifBranch := statementSeqNode ;
elsifSeq :=
elsifSeqNode | emptyNode
;
elseBranch :=
statementSeqNode | emptyNode
;
The SWITCH
node encodes a CASE
statement.
caseStmtNode :=
'(' SWITCH designator caseListNode elseBranch ')'
;
The LOOP
node encodes a LOOP
statement.
loopStmtNode :=
'(' LOOP statementSeqNode ')'
;
The WHILE
node encodes a WHILE
statement.
whileStmtNode :=
'(' WHILE exprNode statementSeqNode ')'
;
The REPEAT
node encodes a REPEAT
statement.
repeatStmtNode :=
'(' REPEAT statementSeqNode exprNode ')'
;
The FORTO
node encodes a FOR
statement.
forToStmtNode :=
'(' FORTO identNode startValue endValue stepValue statementSeqNode ')'
;
startValue : exprNode ;
endValue := expNode ;
stepValue :=
exprNode | emptyNode
;
The EXIT
node encodes an EXIT
statement.
exitStmtNode :=
'(' EXIT ')'
;
The ARGS
node encodes actual parameters in a procedure or function call.
actualParamsNode :=
'(' ARGS exprNode+ ')'
;
The ELSIFSEQ
node encodes an ELSIF
sequence within an IF
statement.
elsifSeqNode :=
'(' ELSIFSEQ elsifNode+ ')'
;
The ELSIF
node encodes a single ELSIF
branch within an IF
statement.
elsifNode :=
'(' ELSIF exprNode statementSeqNode ')'
;
The CASELIST
node encodes a case list within a CASE
statement.
caseListNode :=
'(' CASELIST caseBranchNode+ ')'
;
The CASE
node encodes a case branch within a CASE
statement.
caseBranchNode :=
'(' CASE caseLabelListNode statementSeqNode ')'
;
The ELEMLIST
node encodes the element list within a set value.
elementListNode :=
'(' ELEMLIST element+ ')'
;
element :=
expr | range
;
The RANGE
node encodes a value range.
range :=
'(' RANGE lowerValue upperValue ')'
;
lowerValue := exprNode ;
upperValue := exprNode ;
The FIELD
node encodes a record field selector.
fieldSelectorNode :=
'(' FIELD selector ')'
;
selector :=
qualidentNode | designatorNode
;
The INDEX
node encodes an array subscript selector.
arrayIndexNode :=
'(' INDEX subscript+ ')'
;
subscript := exprNode ;
The DESIG
node encodes a designator.
designatorNode :=
'(' DESIG head tail ')'
;
head :=
qualidentNode | derefNode
;
tail :=
fieldSelectorNode | arrayIndexNode | emptyNode
;
The DEREF
node encodes a pointer dereference.
derefNode :=
'(' DEREF pointer ')'
;
pointer :=
qualident | derefNode | designatorNode
;
There are 25 nodes that may represent expressions or sub-expressions:
NOT
,
AND
,
OR
,
EQ
,
NEQ
,
LT
,
LTEQ
,
GT
,
GTEQ
,
IN
,
NEG
,
PLUS
,
MINUS
,
STAR
,
SLASH
,
DIV
,
MOD
,
SETDIFF
,
DESIG
,
FCALL
,
SETVAL
.
INTVAL
,
REALVAL
,
CHRVAL
,
QUOTEDVAL
.
expr :=
boolExpr | relationalExpr | arithmeticExpr |
designator | funcCallNode | setValNode | literalValue
;
There are three boolean expression nodes:
boolExpr :=
notNode | andNode | orNode
;
There are seven relational expression nodes:
EQ
,
NEQ
,
LT
,
LTEQ
,
GT
,
GTEQ
,
IN
.
relationalExpr :=
eqNode | neqNode | ltNode | ltEqNode | gtNode | gtEqNode | inNode
;
There are eight arithmetic expression nodes:
NEG
,
PLUS
,
MINUS
,
STAR
,
SLASH
,
DIV
,
MOD
,
SETDIFF
.
arithmeticExpr :=
negNode | plusNode | minusNode | starNode | slashNode |
divNode | modulusNode | setDiffNode
;
There are four literal value expression nodes:
INTVAL
,
REALVAL
,
CHRVAL
,
QUOTEDVAL
.
literalValue :=
intValNode | realValNode | chrValNode | quotedValNode
;
The NOT
node encodes an expression of the form NOT expr
.
notNode :=
'(' NOT right ')'
;
The AND
node encodes an expression of the form expr1 AND expr2
.
andNode :=
'(' AND left right ')'
;
The OR
node encodes an expression of the form expr1 OR expr2
.
orNode :=
'(' OR left right ')'
;
The EQ
node encodes an expression of the form expr1 = expr2
.
eqNode :=
'(' EQ left right ')'
;
left : exprNode ;
The NEQ
node encodes an expression of the form expr1 # expr2
.
neqNode :=
'(' NEQ left right ')'
;
The LT
node encodes an expression of the form expr1 < expr2
.
ltNode :=
'(' LT left right ')'
;
The LTEQ
node encodes an expression of the form expr1 <= expr2
.
ltEqNode :=
'(' LTEQ left right ')'
;
The GT
node encodes an expression of the form expr1 > expr2
.
gtNode :=
'(' GT left right ')'
;
The GTEQ
node encodes an expression of the form expr1 >= expr2
.
gtEqNode :=
'(' GTEQ left right ')'
;
The IN
node encodes an expression of the form expr1 IN expr2
.
inNode :=
'(' IN left right ')'
;
The NEG
node encodes an expression of the form - expr
.
negNode :=
'(' NEG right ')'
;
right : exprNode ;
The PLUS
node encodes an expression of the form expr1 + expr2
.
plusNode :=
'(' PLUS left right ')'
;
The MINUS
node encodes an expression of the form expr1 - expr2
.
minusNode :=
'(' MINUS left right ')'
;
The STAR
node encodes an expression of the form expr1 * expr2
.
starNode :=
'(' STAR left right ')'
;
The SLASH
node encodes an expression of the form expr1 / expr2
.
slashNode :=
'(' SLASH left right ')'
;
The DIV
node encodes an expression of the form expr1 DIV expr2
.
divNode :=
'(' DIV left right ')'
;
The MOD
node encodes an expression of the form expr1 MOD expr2
.
modulusNode :=
'(' MOD left right ')'
;
The SETDIFF
node encodes an expression of the form expr1 \ expr2
.
setDiffNode :=
'(' SETDIFF left right ')'
;
The FCALL
node encodes a function call expression.
funcCallNode :=
'( FCALL designator actualParams ')'
;
The SETVAL
node encodes a set value expression.
setValNode :=
'( SETVAL elementList setTypeIdent ')'
;
elementList :=
actualParams | emptyNode
;
setTypeIdent :=
ident | qualident | emptyNode
;
The FILENAME
node encodes the filename of the source file.
filenameNode :=
'(' FILENAME '"' filename '"' ')'
;
(FILENAME "parser.mod")
The OPTIONS
node encodes the compiler options used when compiling the source file.
optionsNode :=
'(' OPTIONS ( '"' option-name '"' )+ ')'
;
(OPTIONS "--pim4" "--no-synonyms" "--no-coroutines" ...)
The IDENT
node encodes an identifier.
identNode :=
'(' IDENT '"' Ident '"' ')'
;
(IDENT "foobar")
The IDENTLIST
node encodes an identifier list.
identListNode :=
'(' IDENTLIST ( '"' Ident '"' )+ ')'
;
(IDENTLIST "foo" "bar" "baz" ...)
The QUALIDENT
node encodes the component identifiers of a qualified identifier.
qualidentNode :=
'(' QUALIDENT ( '"' Ident '"' ) ( '"' Ident '"' )+ ')'
;
(QUALIDENT "foo" "bar" ...)
The INTVAL
node encodes a whole number value.
intValNode :=
'(' INTVAL ( lexeme | '#' lexeme ) ')'
;
(INTVAL 12345)
(INTVAL #0x7FFF)
The REALVAL
node encodes a real number value.
realValNode :=
'(' REALVAL lexeme ')'
;
(REALVAL 1.234)
(REALVAL 5.678e9)
The CHRVAL
node encodes a character code value.
chrValNode :=
'(' CHRVAL '#' lexeme ')'
;
(CHRVAL #0u7F)
The QUOTEDVAL
node encodes a quoted character or string value.
quotedValNode :=
'(' QUOTEDVAL '"' lexeme '"' ')'
;
(QUOTEDVAL "quoted character or string")
+++
Copyright (C) 2017 Modula-2 Software Foundation