-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from V0ldek/semantic-analysis
Semantic analysis.
- Loading branch information
Showing
131 changed files
with
2,233 additions
and
631 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
# Changelog for Latte | ||
|
||
## Unreleased changes | ||
## 0.8 | ||
|
||
- Added the frontend including the Lexer, Parser, Rewriter and Analyser. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,7 @@ | ||
Copyright Author name here (c) 2020 | ||
Copyright 2020 Mateusz Gienieczko | ||
|
||
All rights reserved. | ||
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are met: | ||
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | ||
|
||
* Redistributions of source code must retain the above copyright | ||
notice, this list of conditions and the following disclaimer. | ||
|
||
* Redistributions in binary form must reproduce the above | ||
copyright notice, this list of conditions and the following | ||
disclaimer in the documentation and/or other materials provided | ||
with the distribution. | ||
|
||
* Neither the name of Author name here nor the names of other | ||
contributors may be used to endorse or promote products derived | ||
from this software without specific prior written permission. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | ||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | ||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | ||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | ||
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | ||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | ||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | ||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | ||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
all: | ||
stack build --copy-bins | ||
|
||
clean: | ||
stack clean | ||
rm -f latc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,133 @@ | ||
# Latte | ||
# Latte v0.8 | ||
|
||
Compiler of the [Latte programming language](https://www.mimuw.edu.pl/~ben/Zajecia/Mrj2020/Latte/description.html) written in Haskell. | ||
|
||
## Compiling the project | ||
|
||
Use `stack build` to compile the project. Use the `latc` executable to compile `.lat` source files. | ||
|
||
Used version of GHC is 8.8.4 (LTS 16.22). For the breakdown of used packages consult `package.yaml`. | ||
|
||
## Testing the project | ||
|
||
Use `stack test` to run all included tests. The `lattest` directory contains all tests provided on | ||
the [assignment page](https://www.mimuw.edu.pl/~ben/Zajecia/Mrj2020/latte-en.html) (these are unchanged) | ||
and additional custom tests. | ||
|
||
## Features | ||
|
||
- Full lexer and parser for Latte with objects, virtual methods and arrays. | ||
- Full semantic analysis with type checks and reachability analysis. | ||
|
||
## Custom extensions | ||
|
||
The grammar has been extended to include a `var` type declaration that can be used when declaring local variables to infer | ||
their type to the compile-time type of the initialiser expression. It cannot be used in any other context and it cannot be | ||
used in declarations without an initialiser. The motivation behind this is mainly so that `for` loop rewrite works correctly, | ||
but it is also useful as a language feature so it is exposed to the user. | ||
|
||
## Compilation process | ||
|
||
After lexing and parsing that is automated using BNFC the compilation proceeds in phases. | ||
|
||
### Phase one - syntactical rewrite | ||
|
||
The `Syntax.Rewriter` module rewrites the parsed syntax tree into a desugarised, simplified version of the code. | ||
The most important jobs performed by the Rewriter are rewriting `for` loops and computing constant expressions. | ||
|
||
#### `for` loops | ||
|
||
A `for` loop has the general form of: | ||
|
||
``` | ||
for (<type> <loop_var_ident> : <expr>) | ||
<stmt> | ||
``` | ||
|
||
This is desugarised to simpler language constructs into a sequence of statements with semantics equivalent to: | ||
|
||
``` | ||
{ | ||
var ~l_arr = <expr>; | ||
int ~l_idx = 0; | ||
while (~l_idx < ~l_arr.length) { | ||
<type> <loop_var_ident> = ~l_arr[~l_idx]; | ||
<stmt> | ||
} | ||
} | ||
``` | ||
|
||
Note: all identifiers starting with the tylda character are internal identifiers used by the compiler. | ||
By design, these are inexpressible using lexical rules of Latte, so they can only be accessed by generated code | ||
and not by user supplied source. | ||
|
||
#### Constant expressions | ||
|
||
Expressions that are computable at compile-time are rewritten during this phase into their result. | ||
This includes simple arithmetic expressions and relational operations that contain only constants | ||
as their atomic components. | ||
|
||
### Phase two - toplevel metadata | ||
|
||
The `SemanticAnalysis.Toplevel` module parses definitions of classes, methods and toplevel functions | ||
and converts them to metadata containing field and method tables of all classes and their | ||
inheritance hierarchies. The important jobs in this phase are: | ||
|
||
- resolving inheritance hierarchies and asserting they contain no cycles; | ||
- creating class method tables, taking method overriding into account; | ||
- analysing field, method and formal parameter names asserting there are no duplicates; | ||
- wrapping all toplevel functions into the special `~cl_toplevel` class. | ||
|
||
### Phase three - sematic analysis | ||
|
||
The `SemanticAnalysis.Analyser` module computes type annotations, computes symbol tables | ||
and performs control flow analysis. This is the biggest part of the frontend and contains all typing, | ||
scoping and control flow rules. | ||
|
||
The scoping rules are straightforward. No two symbols can be declared with the same identifier in the same scope. | ||
Each block introduces a new scope. Additionally, every `if`, `while` and `for` statement introduces an implicit | ||
block, even if it is single statement. This is to prevent code like: | ||
|
||
``` | ||
if (cond) int x = 0; | ||
return x; | ||
``` | ||
from compiling. | ||
|
||
The type rules introduce internal types for functions and a `Ref t` type, which is a reference to a symbol of type `t`. | ||
The type rules do not cause a type to be wrapped in more than one `Ref` layer. The `Ref` layer is used to distinguish | ||
between l-values and r-values: an assignment is valid if and only if its left-hand-side is a `Ref` type. Fora actual | ||
typing rules consult the code in `SemanticAnalysis.Analyser`. | ||
|
||
The control flow rules currently concern themselves with function return statements. Any non-void function is required | ||
to have a `return` statement on each possible execution path. The Analyser tracks reachability of statements and combines | ||
branch reachability of `if` and `while` statements. An important optimisation is that if a condition of a conditional | ||
statement is trivially true (or false) the branch is considered to be always (or never) entered. Trivially true or false | ||
means that it is either a true or false literal, but since this phase is performed after the Rewriter all constant boolean expressions | ||
are already collapsed to a single literal. | ||
|
||
## Grammar conflicts | ||
|
||
The grammar contains 3 shift/reduce conflicts. | ||
|
||
The first conflict is the standard issue with single-statement `if` statements that makes statements of the following form ambiguous: | ||
``` | ||
if cond1 | ||
if cond2 | ||
stmt1 | ||
else | ||
stmt2 | ||
``` | ||
|
||
The second conflict is the ambiguity between `(new t) [n]` and `new t[n]`. We cannot distinguish between a creation of an array | ||
and an instantiation of a type with immediate indexing into it. The conflict is correctly solved in favour of array creation. | ||
|
||
The third conflict is between a parenthesised single expression and a casted `null` expression, which is correctly resolved in favour of the `null` expression. | ||
|
||
## Sources | ||
|
||
A few parts of the code were directly copied or heavily inspired by my earlier work on the Harper language (https://github.com/V0ldek/Harper), | ||
most notably the control flow analysis monoid based approach. | ||
|
||
The grammar rules for `null` literals are a slightly modified version of rules proposed by Krzysztof Małysa. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,33 @@ | ||
{-# LANGUAGE FlexibleInstances #-} | ||
module Error where | ||
|
||
import Syntax.Abs (Pos, Positioned (..)) | ||
import Syntax.Printer (Print, printTree) | ||
import Data.Maybe | ||
import Syntax.Abs (Pos, Positioned (..), Unwrappable (..)) | ||
import Syntax.Code | ||
|
||
errorMsg :: String -> Pos -> String -> String | ||
class WithContext a where | ||
getCtx :: a -> String | ||
|
||
instance WithContext Code where | ||
getCtx = codeString | ||
|
||
instance WithContext (Maybe Code) where | ||
getCtx c = maybe "" getCtx c | ||
|
||
errorMsg :: Positioned a => String -> a -> String -> String | ||
errorMsg msg a ctx = msg ++ "\n" ++ ctxMsg | ||
where ctxMsg = lineInfo a ++ ":\n" ++ ctx | ||
where | ||
ctxMsg = lineInfo (pos a) ++ ":\n" ++ ctx | ||
|
||
errorMsgMb :: Positioned a => String -> Maybe a -> Maybe String -> String | ||
errorMsgMb msg a ctx = msg ++ "\n" ++ ctxMsg | ||
where | ||
ctxMsg = lineInfo (a >>= pos) ++ ":\n" ++ fromMaybe "" ctx | ||
|
||
errorCtxMsg :: (Positioned a, Print a) => String -> a -> String | ||
errorCtxMsg msg ctx = errorMsg msg (pos ctx) (printTree ctx) | ||
errorCtxMsg :: (Positioned a, WithContext a, Unwrappable f) => String -> f a -> String | ||
errorCtxMsg msg ctx = errorMsg msg (unwrap ctx) (getCtx $ unwrap ctx) | ||
|
||
lineInfo :: Pos -> String | ||
lineInfo (ln, ch) = "Line " ++ show ln ++ ", character " ++ show ch | ||
lineInfo :: Maybe Pos -> String | ||
lineInfo pos = case pos of | ||
Nothing -> "" | ||
Just (ln, ch) -> "Line " ++ show ln ++ ", character " ++ show ch |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
-- Reserved identifiers used internally by the compiler. | ||
-- Any identifier starting with '~' is meant to be invisible | ||
-- by user code and unspeakable using lexical rules of the language. | ||
module Identifiers where | ||
|
||
import Syntax.Abs | ||
|
||
-- Identifier of the class that wraps toplevel functions. | ||
topLevelClassIdent :: Ident | ||
topLevelClassIdent = Ident "~cl_TopLevel" | ||
|
||
-- Internal identifier of the method being currently compiled. | ||
currentMthdSymIdent :: Ident | ||
currentMthdSymIdent = Ident "~mthd_current" | ||
|
||
-- Identifiers used in for loop translation. | ||
forArrayIdent :: Ident | ||
forArrayIdent = Ident "~l_arr" | ||
|
||
forIndexIdent :: Ident | ||
forIndexIdent = Ident "~l_idx" | ||
|
||
|
||
selfSymIdent :: Ident | ||
selfSymIdent = Ident "self" | ||
|
||
arrayLengthIdent :: Ident | ||
arrayLengthIdent = Ident "length" | ||
|
||
reservedNames :: [Ident] | ||
reservedNames = [selfSymIdent] |
Oops, something went wrong.