-
Notifications
You must be signed in to change notification settings - Fork 1
AST Specification
M2J encodes valid Modula-2 input in memory as an abstract syntax tree (AST).
The AST uses three kinds of nodes.
- an
EMPTY
sentinel node - non-terminal nodes
- terminal nodes
The sentinel node EMPTY
is used to encode the absence of an optional sub-node in non-terminal nodes.
Non-terminal nodes are used to encode non-terminal symbols such as modules, imports, definitions, declarations, statements and expressions.
AST
, DEFMOD
, IMPLIST
, IMPORT
, UNQIMP
, DEFLIST
, CONSTDEF
, TYPEDEF
, PROCDEF
, SUBR
, ENUM
, SET
, ARRAY
, RECORD
, POINTER
, PROCTYPE
, EXTREC
, VRNTREC
, INDEXLIST
, FIELDLISTSEQ
, FIELDLIST
, VFLISTSEQ
, VFLIST
, VARIANTLIST
, VARIANT
, CLABELLIST
, CLABELS
, FTYPELIST
, OPENARRAY
, CONSTP
, VARP
, FPARAMLIST
, FPARAMS
, IMPMOD
, BLOCK
, DECLLIST
, TYPEDECL
, VARDECL
, PROC
, MODDECL
, VSR
, VSFIELD
, EXPORT
, QUALEXP
, STMTSEQ
, ASSIGN
, PCALL
, RETURN
, WITH
, IF
, SWITCH
, LOOP
, WHILE
, REPEAT
, FORTO
, EXIT
, ARGS
, ELSIFSEQ
, ELSIF
, CASELIST
, CASE
, FIELD
, INDEX
, DESIG
, DEREF
, NOT
, AND
, OR
, NEG
, EQ
, NEQ
, LT
, LTEQ
, GT
, GTEQ
, IN
, PLUS
, MINUS
, STAR
, SLASH
, DIV
, MOD
, SETDIFF
, FCALL
, SETVAL
.
Terminal nodes are used to encode terminal symbols such as filenames, options, identifiers, integer literals, real number literals, character code and quoted literals.
IDENT
, QUALIDENT
, IDENTLIST
, INTVAL
, REALVAL
, CHRVAL
, QUOTEDVAL
, FILENAME
, OPTIONS
.
Any AST node may be represented in a serialised format of the form
( nodetype subnode-0 subnode-1 subnode-2 ... subnode-N )
where the actual number of sub-nodes is dependent on the node type.
This form of tree representation is called an S-expression.
The structure of the AST used by M2Sharp is described below in S-expression format.
The EMPTY
node encodes the absence of an optional sub-node.
emptyNode :=
'(' EMPTY ')'
;
The AST
node encodes the root of the syntax tree.
astRootNode :=
'(' AST filename options compilationUnit ')'
;
filename := filenameNode ; /* terminal node */
options := optionsNode ; /* terminal node */
compilationUnit :=
defModuleNode | impModuleNode
;
The DEFMOD
node encodes a definition module.
defModuleNode :=
'(' DEFMOD moduleIdent importList definitionList ')'
;
moduleIdent := identNode ; /* terminal node */
importList :=
importListNode | emptyNode
;
definitionList :=
definitionListNode | emptyNode
;
The IMPLIST
node encodes the entirety of import directives in a module.
importListNode :=
'(' IMPLIST import+ ')'
;
There are two types of import nodes:
IMPORT
, UNQIMP
.
import :=
importNode | unqImportNode
;
The IMPORT
node encodes a qualified import directive.
importNode :=
'(' IMPORT identList ')'
;
identList := identListNode ; /* terminal node */
The UNQIMP
node encodes an unqualified import directive.
unqImportNode :=
'(' UNQIMP moduleIdent identList ')'
;
The DEFLIST
node encodes one or more definitions.
definitionListNode :=
'(' DEFLIST definition+ ')'
;
There are four types of definition nodes:
CONSTDEF
, TYPEDEF
, VARDECL
, PROCDEF
.
definition :=
constDefNode | typeDefNode | varDeclNode | ProcDefNode
;
The CONSTDEF
node encodes a constant definition.
constDefNode :=
'(' CONSTDEF identNode exprNode ')'
;
The TYPEDEF
node encodes a type definition.
typeDefNode :=
'(' TYPEDEF identNode ( typeNode | emptyNode ) ')'
;
The PROCDEF
node encodes a procedure definition.
procDefNode :=
'(' PROCDEF identNode formalParamList returnType ')'
;
formalParamList :=
formalParamListNode | emptyNode
;
There are eleven nodes that may represent the definition part of a type definition or an anonymous type.
IDENT
, QUALIDENT
, SUBR
, ENUM
, SET
, ARRAY
, RECORD
, EXTREC
, VRNTREC
, POINTER
, PROCTYPE
.
typeNode :=
identNode | qualidentNode |
subrTypeNode | enumTypeNode | setTypeNode | arrayTypeNode | recTypeNode |
extRecTypeNode | vrntRecTypeNode | pointerTypeNode | procTypeNode
;
There are nine nodes that may represent the type definition part of a field definition or array base type.
IDENT
, QUALIDENT
, SUBR
, ENUM
, SET
, ARRAY
, RECORD
, POINTER
, PROCTYPE
.
fieldType :=
identNode | qualidentNode | subrTypeNode | enumTypeNode | setTypeNode |
arrayTypeNode | recTypeNode | pointerTypeNode | procTypeNode
;
There are three nodes that may represent the type definition part of a derived type.
IDENT
, QUALIDENT
, SUBR
.
derivedType :=
identNode | qualidentNode | subrTypeNode
;
There are three nodes that represent the type definition part of a record type.
IDENT
, QUALIDENT
, SUBR
.
recordType :=
recTypeNode | extRecTypeNode | vrntRecTypeNode
;
The SUBR
node encodes a subrange type definition.
subrTypeNode :=
'(' SUBR lowerBound upperBound subrBaseType ')'
;
lowerBound := exprNode ;
upperBound := exprNode ;
subrBaseType :=
identNode | qualidentNode | emptyNode
;
The ENUM
node encodes an enumeration type definition.
enumTypeNode :=
'(' ENUM identListNode ')'
;
The SET
node encodes a set type definition.
setTypeNode :=
'(' SET countableType ')'
;
countableType :=
qualidentNode | subrTypeNode | enumTypeNode
;
The ARRAY
node encodes an array type definition.
arrayTypeNode :=
'(' ARRAY indexTypeListNode arrayBaseType ')'
;
arrayBaseType := fieldType ;
The RECORD
node encodes a non-variant non-extensible record type definition.
recTypeNode :=
'(' RECORD fieldListSeqNode ')'
;
The POINTER
node encodes a pointer type definition.
pointerTypeNode :=
'(' POINTER typeNode ')'
;
The PROCTYPE
node encodes a procedure type definition.
procTypeNode :=
'(' PROCTYPE formalTypeList returnedType ')'
;
formalTypeList :=
formalTypeListNode | emptyNode
;
returnedType :=
identNode | qualidentNode | emptyNode
;
The EXTREC
node encodes an extensible record type definition.
extRecTypeNode :=
'(' EXTREC recBaseType fieldListSeqNode ')'
;
recBaseType :=
qualidentNode
;
The VRNTREC
node encodes a variant record type definition.
variantRecTypeNode :=
'(' VRNTREC variantFieldListSeqNode ')'
;
The INDEXLIST
node encodes one or more index types within an array type definition.
indexTypeListNode :=
'(' INDEXLIST indexType+ ')'
;
indexType := countableType ;
The FIELDLISTSEQ
node encodes a non-variant field list sequence within a record type definition.
fieldListSeqNode :=
'(' FIELDLISTSEQ fieldListNode+ ')'
;
The FIELDLIST
node encodes a non-variant field list within a field list sequence
fieldListNode :=
'(' FIELDLIST identListNode fieldType ')'
;
The VFLISTSEQ
node encodes a field list sequence within a variant record type definition.
variantFieldListSeqNode :=
'(' VFLISTSEQ ( fieldListNode | variantFieldListNode )+ ')'
;
The VFLIST
node encodes a variant field list within a variant record field list sequence.
variantFieldListNode :=
'(' VFLIST caseIdent caseType variantList defaultFieldListSeq ')'
;
caseIdent :=
identNode | emptyNode
;
caseType :=
qualidentNode
;
defaultFieldListSeq :=
fieldListSeqNode | emptyNode
;
The VARIANTLIST
node encodes a variant list within a variant field list.
variantList :=
'(' VARIANTLIST variantNode+ ')'
;
The VARIANT
node encodes a variant within a variant list.
variantNode :=
'(' VARIANT caseLabelListNode fieldListSeqNode ')'
;
The CLABELLIST
node encodes a case label list within a variant record definition or case statement.
caseLabelListNode :=
'(' CLABELLIST caseLabelsNode+ ')
;
The CLABELS
node encodes start and end labels within a case label list.
caseLabelsNode :=
'(' CLABELS startLabel endLabel ')'
;
startLabel := exprNode ;
endLabel := exprNode | emptyNode ;
The FTYPELIST
node encodes a formal type list within a procedure type definition.
formalTypeListNode :
'(' FTYPELIST formalType+ ')'
;
formalType :=
simpleFormalType | attrFormalType
;
simpleFormalType :=
typeIdent | openArrayTypeNode
;
attrFormalType :=
constAttrFormalTypeNode | varAttrFormalTypeNode
;
The OPENARRAY
node encodes an open array parameter within a formal type list.
openArrayTypeNode :=
'(' OPENARRAY typeIdent ')'
;
typeIdent :=
identNode | qualidentNode
;
The CONSTP
node encodes a CONST
parameter within a formal type list.
constAttrFormalTypeNode :=
'(' CONSTP simpleFormalType ')'
;
The VARP
node encodes a VAR
parameter within a formal type list.
varAttrFormalTypeNode :=
'(' VARP simpleFormalType ')'
;
The FPARAMLIST
node encodes a formal parameter list within a procedure type definition or procedure signature.
formalParamListNode :=
'(' FPARAMLIST formalParamsNode+ ')'
;
The FPARAMS
node encodes formal parameters within a formal parameter list.
formalParamsNode :=
'(' FPARAMS identListNode formalTypeNode ')'
;
The IMPMOD
node encodes an implementation module.
impModuleNode :=
'(' IMPMOD moduleIdent importListNode blockNode ')'
;
The BLOCK
node encodes a block within a module or procedure.
blockNode :=
'(' BLOCK declarationList body ')'
;
declarationList :=
declarationListNode | emptyNode
;
body :=
statementSeqNode | emptyNode
;
The DECLLIST
node encodes one or more declarations within a block.
declarationListNode :=
'(' DECLLIST declarationNode+ ')'
;
There are five types of declaration nodes:
CONSTDEF
, TYPEDECL
, VARDECL
, PROC
, MODDECL
.
declarationNode :=
constDefNode | typeDeclNode | varDeclNode | procDeclNode | modDeclNode
;
The TYPEDECL
node encodes a type declaration.
typeDeclNode :=
'(' TYPEDECL identNode ( typeNode | vsrTypeNode ) ')'
;
The VARDECL
node encodes a variable or field declaration.
varDeclNode :=
'(' VARDECL identListNode fieldType ')'
;
The PROC
node encodes a procedure declaration.
procDeclNode :=
'(' PROC identNode formalParamListNode returnedType blockNode ')'
;
The MODDECL
node encodes a local module declaration.
modDeclNode :=
'(' MODDECL moduleIdent importListNode exportList blockNode ')'
;
exportList :=
unqualExportNode | qualExportNode | emptyNode
;
The VSR
node encodes a variable size record type declaration.
vsrTypeNode :=
'(' VSR fieldListSeqNode varSizeFieldNode ')'
;
The VSFIELD
node encodes the indeterminate field of a variable size record type.
varSizeFieldNode :=
'(' VSFIELD varSizeField determinantField varSizeFieldType ')'
;
varSizeField := IdentNode ;
determinantField := IdentNode ;
varSizeFieldType := qualidentNode ;
The EXPORT
node encodes an unqualified export directive within a local module declaration.
unqualExportNode :=
'(' EXPORT identListNode ')'
;
The QUALEXP
node encodes a qualified export directive within a local module declaration.
qualExportNode :=
'(' QUALEXP identListNode ')'
;
The STMTSEQ
node encodes a statement sequence.
statementSeqNode :=
'(' STMTSEQ statementNode+ ')'
;
There are eleven types of statement nodes:
ASSIGN
, PCALL
, RETURN
, WITH
, IF
, SWITCH
, LOOP
, WHILE
, REPEAT
, FORTO
, EXIT
.
statementNode :=
ASSIGN | controlStmt | WITH
;
Nine statement nodes represent control statements:
controlStmt :=
PCALL | RETURN | IF | SWITCH | loopCtrlStmt
;
Five statement nodes represent loop control statements:
loopCtrlStatement :=
loopStmtNode | whileStmtNode | repeatStmtNode | ForToStmtNode | EXIT
;
The ASSIGN
node encodes an assignment statement.
assignmentNode :=
'(' ASSIGN designator exprNode ')'
;
designator :=
identNode | qualidentNode | derefNode | designatorNode
;
The PCALL
node encodes a procedure call statement.
procCallNode :=
'( PCALL designator actualParams ')'
;
actualParams :=
actualParamsNode | emptyNode
;
The RETURN
node encodes a RETURN
statement.
returnStmtNode :=
'(' RETURN returnValue ')'
;
returnValue :=
exprNode | emptyNode
;
The WITH
node encodes a WITH
statement.
withStmtNode :=
'(' WITH designator statementSeqNode ')'
;
The IF
node encodes an IF
statement.
ifStmtNode :=
'(' IF exprNode ifBranch elsifSeq elseBranch ')'
;
ifBranch := statementSeqNode ;
elsifSeq :=
elsifSeqNode | emptyNode
;
elseBranch :=
statementSeqNode | emptyNode
;
The SWITCH
node encodes a CASE
statement.
caseStmtNode :=
'(' SWITCH designator caseListNode elseBranch ')'
;
The LOOP
node encodes a LOOP
statement.
loopStmtNode :=
'(' LOOP statementSeqNode ')'
;
The WHILE
node encodes a WHILE
statement.
whileStmtNode :=
'(' WHILE exprNode statementSeqNode ')'
;
The REPEAT
node encodes a REPEAT
statement.
repeatStmtNode :=
'(' REPEAT statementSeqNode exprNode ')'
;
The FORTO
node encodes a FOR
statement.
forStmtNode :=
'(' FORTO identNode startValue endValue stepValue statementSeqNode ')'
;
startValue : exprNode ;
endValue := expNode ;
stepValue :=
exprNode | emptyNode
;
The EXIT
node encodes an EXIT
statement.
exitStmtNode :=
'(' EXIT ')'
;
The ARGS
node encodes actual parameters in a procedure or function call.
actualParamsNode :=
'(' ARGS exprNode+ ')'
;
The ELSIFSEQ
node encodes an ELSIF
sequence within an IF
statement.
elsifSeqNode :=
'(' ELSIFSEQ elsifNode+ ')'
;
The ELSIF
node encodes a single ELSIF
branch within an IF
statement.
elsifNode :=
'(' ELSIF exprNode statementSeqNode ')'
;
The CASELIST
node encodes a case list within a CASE
statement.
caseListNode :=
'(' CASELIST caseBranchNode+ ')'
;
The CASE
node encodes a case branch within a CASE
statement.
caseBranchNode :=
'(' CASE caseLabelListNode statementSeqNode ')'
;
The ELEMLIST
node encodes the element list within a set value.
elementListNode :=
'(' ELEMLIST element+ ')'
;
element :=
expr | range
;
The RANGE
node encodes a value range.
range :=
'(' RANGE lowerValue upperValue ')'
;
lowerValue := expr ;
upperValue := expr ;
The FIELD
node encodes a record field selector.
fieldSelectorNode :=
'(' FIELD selector ')'
;
selector :=
qualidentNode | designatorNode
;
The INDEX
node encodes an array subscript selector.
arrayIndexNode :=
'(' INDEX subscript+ ')'
;
subscript := exprNode ;
exprNode :=
negNode | notNode | eqNode | neqNode | ltNode | ltEqNode | gtNode |
gtEqNode | inNode | plusNode | minusNode | orNode | asteriskNode |
solidusNode | divNode | modulusNode | andNode | designator |
intValNode | realValNode | chrValNode | quotedValNode |
funcCallNode | setValNode
;
The DESIG
node encodes a designator.
designatorNode :=
'(' DESIG head tail ')'
;
head :=
qualidentNode | derefNode
;
tail :=
fieldSelectorNode | arrayIndexNode | emptyNode
;
The DEREF
node encodes a pointer dereference.
derefNode :=
'(' DEREF pointer ')'
;
pointer :=
qualident | derefNode | designatorNode
;
There are 25 nodes that may represent expressions or sub-expressions:
NOT
, AND
, OR
, EQ
, NEQ
, LT
, LTEQ
, GT
, GTEQ
, IN
,
NEG
, PLUS
, MINUS
, STAR
, SLASH
, DIV
, MOD
, SETDIFF
,
DESIG
, FCALL
, SETVAL
, INTVAL
, REALVAL
, CHRVAL
, QUOTEDVAL
.
expr :=
boolExpr | relationalExpr | arithmeticExpr |
DESIG | FCALL | SETVAL | literalValue
;
There are three boolean expression nodes:
NOT
, AND
, OR
.
boolExpr :=
NOT | AND | OR
;
There are seven relational expression nodes:
EQ
, NEQ
, LT
, LTEQ
, GT
, GTEQ
, IN
.
relationalExpr :=
EQ | NEQ | LT | LTEQ | GT | GTEQ | IN
;
There are eight arithmetic expression nodes:
NEG
, PLUS
, MINUS
, STAR
, SLASH
, DIV
, MOD
, SETDIFF
.
arithmeticExpr :=
NEG | PLUS | MINUS | STAR | SLASH | DIV | MOD | SETDIFF
;
There are four literal value expression nodes:
INTVAL
, REALVAL
, CHRVAL
, QUOTEDVAL
.
literalValue :=
INTVAL | REALVAL | CHRVAL | QUOTEDVAL
;
The NOT
node encodes an expression of the form NOT expr
.
notNode :=
'(' NOT right ')'
;
The AND
node encodes an expression of the form expr1 AND expr2
.
andNode :=
'(' AND left right ')'
;
The OR
node encodes an expression of the form expr1 OR expr2
.
orNode :=
'(' OR left right ')'
;
The EQ
node encodes an expression of the form expr1 = expr2
.
eqNode :=
'(' EQ left right ')'
;
left : exprNode ;
The NEQ
node encodes an expression of the form expr1 # expr2
.
neqNode :=
'(' NEQ left right ')'
;
The LT
node encodes an expression of the form expr1 < expr2
.
ltNode :=
'(' LT left right ')'
;
The LTEQ
node encodes an expression of the form expr1 <= expr2
.
ltEqNode :=
'(' LTEQ left right ')'
;
The GT
node encodes an expression of the form expr1 > expr2
.
gtNode :=
'(' GT left right ')'
;
The GTEQ
node encodes an expression of the form expr1 >= expr2
.
gtEqNode :=
'(' GTEQ left right ')'
;
The IN
node encodes an expression of the form expr1 IN expr2
.
inNode :=
'(' IN left right ')'
;
The NEG
node encodes an expression of the form - expr
.
negNode :=
'(' NEG right ')'
;
right : exprNode ;
The PLUS
node encodes an expression of the form expr1 + expr2
.
plusNode :=
'(' PLUS left right ')'
;
The MINUS
node encodes an expression of the form expr1 - expr2
.
minusNode :=
'(' MINUS left right ')'
;
The STAR
node encodes an expression of the form expr1 * expr2
.
starNode :=
'(' STAR left right ')'
;
The SLASH
node encodes an expression of the form expr1 / expr2
.
slashNode :=
'(' SLASH left right ')'
;
The DIV
node encodes an expression of the form expr1 DIV expr2
.
divNode :=
'(' DIV left right ')'
;
The MOD
node encodes an expression of the form expr1 MOD expr2
.
modulusNode :=
'(' MOD left right ')'
;
The SETDIFF
node encodes an expression of the form expr1 \ expr2
.
setDiffNode :=
'(' SETDIFF left right ')'
;
The FCALL
node encodes a function call expression.
funcCallNode :=
'( FCALL designator actualParams ')'
;
The SETVAL
node encodes a set value expression.
setValNode :=
'( SETVAL elementList setTypeIdent ')'
;
elementList :=
actualParams | emptyNode
;
setTypeIdent :=
ident | qualident | emptyNode
;
The FILENAME
node encodes the filename of the source file.
filenameNode :=
'(' FILENAME '"' filename '"' ')'
;
(FILENAME "parser.mod")
The OPTIONS
node encodes the compiler options used when compiling the source file.
optionsNode :=
'(' OPTIONS ( '"' option-name '"' )+ ')'
;
(OPTIONS "--pim4" "--no-synonyms" "--no-coroutines" ...)
The IDENT
node encodes an identifier.
identNode :=
'(' IDENT '"' Ident '"' ')'
;
(IDENT "foobar")
The IDENTLIST
node encodes an identifier list.
identListNode :=
'(' IDENTLIST ( '"' Ident '"' )+ ')'
;
(IDENTLIST "foo" "bar" "baz" ...)
The QUALIDENT
node encodes the component identifiers of a qualified identifier.
qualidentNode :=
'(' QUALIDENT ( '"' Ident '"' ) ( '"' Ident '"' )+ ')'
;
(QUALIDENT "foo" "bar" ...)
The INTVAL
node encodes a whole number value.
intValNode :=
'(' INTVAL ( lexeme | '#' lexeme ) ')'
;
(INTVAL 12345)
(INTVAL #0x7FFF)
The REALVAL
node encodes a real number value.
realValNode :=
'(' REALVAL lexeme ')'
;
(REALVAL 1.234)
(REALVAL 5.678e9)
The CHRVAL
node encodes a character code value.
chrValNode :=
'(' CHRVAL '#' lexeme ')'
;
(CHRVAL #0u7F)
The QUOTEDVAL
node encodes a quoted character or string value.
quotedValNode :=
'(' QUOTEDVAL '"' lexeme '"' ')'
;
(QUOTEDVAL "quoted character or string")
+++
Copyright (C) 2017 Modula-2 Software Foundation