Skip to content

Commit

Permalink
Merge pull request #1 from V0ldek/semantic-analysis
Browse files Browse the repository at this point in the history
Semantic analysis.
  • Loading branch information
V0ldek authored Dec 16, 2020
2 parents 1245363 + f35b193 commit 7552423
Show file tree
Hide file tree
Showing 131 changed files with 2,233 additions and 631 deletions.
4 changes: 3 additions & 1 deletion ChangeLog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Changelog for Latte

## Unreleased changes
## 0.8

- Added the frontend including the Lexer, Parser, Rewriter and Analyser.
31 changes: 4 additions & 27 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,30 +1,7 @@
Copyright Author name here (c) 2020
Copyright 2020 Mateusz Gienieczko

All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
with the distribution.

* Neither the name of Author name here nor the names of other
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
18 changes: 11 additions & 7 deletions Latte.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@ cabal-version: 1.12
--
-- see: https://github.com/sol/hpack
--
-- hash: 4430bb9e814179e64c9ec984db92aeb927e5057861af8e70424ca3e88bd2d911
-- hash: 07a5b34f37f9b22ff69be83173953e37bf15964dd876bd3357dcc6e7e900a54a

name: Latte
version: 0.1.0.0
description: Please see the README on GitHub at <https://github.com/githubuser/Latte#readme>
version: 0.8.0.0
description: Please see the README on GitHub at <https://github.com/V0ldek/Latte#readme>
homepage: https://github.com/githubuser/Latte#readme
bug-reports: https://github.com/githubuser/Latte/issues
author: Author name here
maintainer: example@example.com
copyright: 2020 Author name here
license: BSD3
author: Mateusz Gienieczko
maintainer: matgienieczko@gmail.com
copyright: 2020 Mateusz Gienieczko
license: MIT
license-file: LICENSE
build-type: Simple
extra-source-files:
Expand All @@ -29,13 +29,17 @@ library
exposed-modules:
ErrM
Error
Identifiers
SemanticAnalysis.Analyser
SemanticAnalysis.Class
SemanticAnalysis.ControlFlow
SemanticAnalysis.Toplevel
Syntax.Abs
Syntax.Code
Syntax.Lexer
Syntax.Parser
Syntax.Printer
Syntax.Rewriter
other-modules:
Paths_Latte
hs-source-dirs:
Expand Down
6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
all:
stack build --copy-bins

clean:
stack clean
rm -f latc
134 changes: 133 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,133 @@
# Latte
# Latte v0.8

Compiler of the [Latte programming language](https://www.mimuw.edu.pl/~ben/Zajecia/Mrj2020/Latte/description.html) written in Haskell.

## Compiling the project

Use `stack build` to compile the project. Use the `latc` executable to compile `.lat` source files.

Used version of GHC is 8.8.4 (LTS 16.22). For the breakdown of used packages consult `package.yaml`.

## Testing the project

Use `stack test` to run all included tests. The `lattest` directory contains all tests provided on
the [assignment page](https://www.mimuw.edu.pl/~ben/Zajecia/Mrj2020/latte-en.html) (these are unchanged)
and additional custom tests.

## Features

- Full lexer and parser for Latte with objects, virtual methods and arrays.
- Full semantic analysis with type checks and reachability analysis.

## Custom extensions

The grammar has been extended to include a `var` type declaration that can be used when declaring local variables to infer
their type to the compile-time type of the initialiser expression. It cannot be used in any other context and it cannot be
used in declarations without an initialiser. The motivation behind this is mainly so that `for` loop rewrite works correctly,
but it is also useful as a language feature so it is exposed to the user.

## Compilation process

After lexing and parsing that is automated using BNFC the compilation proceeds in phases.

### Phase one - syntactical rewrite

The `Syntax.Rewriter` module rewrites the parsed syntax tree into a desugarised, simplified version of the code.
The most important jobs performed by the Rewriter are rewriting `for` loops and computing constant expressions.

#### `for` loops

A `for` loop has the general form of:

```
for (<type> <loop_var_ident> : <expr>)
<stmt>
```

This is desugarised to simpler language constructs into a sequence of statements with semantics equivalent to:

```
{
var ~l_arr = <expr>;
int ~l_idx = 0;
while (~l_idx < ~l_arr.length) {
<type> <loop_var_ident> = ~l_arr[~l_idx];
<stmt>
}
}
```

Note: all identifiers starting with the tylda character are internal identifiers used by the compiler.
By design, these are inexpressible using lexical rules of Latte, so they can only be accessed by generated code
and not by user supplied source.

#### Constant expressions

Expressions that are computable at compile-time are rewritten during this phase into their result.
This includes simple arithmetic expressions and relational operations that contain only constants
as their atomic components.

### Phase two - toplevel metadata

The `SemanticAnalysis.Toplevel` module parses definitions of classes, methods and toplevel functions
and converts them to metadata containing field and method tables of all classes and their
inheritance hierarchies. The important jobs in this phase are:

- resolving inheritance hierarchies and asserting they contain no cycles;
- creating class method tables, taking method overriding into account;
- analysing field, method and formal parameter names asserting there are no duplicates;
- wrapping all toplevel functions into the special `~cl_toplevel` class.

### Phase three - sematic analysis

The `SemanticAnalysis.Analyser` module computes type annotations, computes symbol tables
and performs control flow analysis. This is the biggest part of the frontend and contains all typing,
scoping and control flow rules.

The scoping rules are straightforward. No two symbols can be declared with the same identifier in the same scope.
Each block introduces a new scope. Additionally, every `if`, `while` and `for` statement introduces an implicit
block, even if it is single statement. This is to prevent code like:

```
if (cond) int x = 0;
return x;
```
from compiling.

The type rules introduce internal types for functions and a `Ref t` type, which is a reference to a symbol of type `t`.
The type rules do not cause a type to be wrapped in more than one `Ref` layer. The `Ref` layer is used to distinguish
between l-values and r-values: an assignment is valid if and only if its left-hand-side is a `Ref` type. Fora actual
typing rules consult the code in `SemanticAnalysis.Analyser`.

The control flow rules currently concern themselves with function return statements. Any non-void function is required
to have a `return` statement on each possible execution path. The Analyser tracks reachability of statements and combines
branch reachability of `if` and `while` statements. An important optimisation is that if a condition of a conditional
statement is trivially true (or false) the branch is considered to be always (or never) entered. Trivially true or false
means that it is either a true or false literal, but since this phase is performed after the Rewriter all constant boolean expressions
are already collapsed to a single literal.

## Grammar conflicts

The grammar contains 3 shift/reduce conflicts.

The first conflict is the standard issue with single-statement `if` statements that makes statements of the following form ambiguous:
```
if cond1
if cond2
stmt1
else
stmt2
```

The second conflict is the ambiguity between `(new t) [n]` and `new t[n]`. We cannot distinguish between a creation of an array
and an instantiation of a type with immediate indexing into it. The conflict is correctly solved in favour of array creation.

The third conflict is between a parenthesised single expression and a casted `null` expression, which is correctly resolved in favour of the `null` expression.

## Sources

A few parts of the code were directly copied or heavily inspired by my earlier work on the Harper language (https://github.com/V0ldek/Harper),
most notably the control flow analysis monoid based approach.

The grammar rules for `null` literals are a slightly modified version of rules proposed by Krzysztof Małysa.
17 changes: 14 additions & 3 deletions app/Main.hs
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@ import System.Exit (exitFailure, exitSuccess)
import System.IO (hPutStr, hPutStrLn, stderr)

import ErrM (toEither)
import SemanticAnalysis.Analyser (analyse)
import SemanticAnalysis.Toplevel (Metadata, programMetadata)
import Syntax.Abs (Pos, Program, unwrapPos)
import Syntax.Lexer (Token)
import Syntax.Parser (myLexer, pProgram)
import Syntax.Printer (Print, printTree)
import Syntax.Rewriter (rewrite)

type Err = Either String
type ParseResult = (Program (Maybe Pos))
Expand All @@ -37,7 +39,7 @@ unlessM p a = do
unless b a

runFile :: Verbosity -> ParseFun ParseResult -> FilePath -> IO ()
runFile v p f = putStrLn f >> readFile f >>= run v p
runFile v p f = readFile f >>= run v p

run :: Verbosity -> ParseFun ParseResult -> String -> IO ()
run v p s = case p ts of
Expand All @@ -51,18 +53,27 @@ run v p s = case p ts of
let tree' = unwrapPos tree
putStrErrLn "OK"
showTree v tree'
case programMetadata tree' of
let rewritten = rewrite tree'
putStrErrV v "Rewritten:"
showTree v rewritten
() <- case programMetadata rewritten of
Left err -> do
putStrErrLn "ERROR"
putStrErrLn err
exitFailure
Right meta -> do
showMetadata v meta
case analyse meta of
Right _ -> exitSuccess
Left err -> do
putStrErrLn "ERROR"
putStrErrLn err
exitFailure
exitSuccess
where
ts = myLexer s

showMetadata :: Verbosity -> Metadata -> IO ()
showMetadata :: Verbosity -> Metadata a -> IO ()
showMetadata v meta = putStrV v $ show meta

showTree :: (Show a, Print a) => Int -> a -> IO ()
Expand Down
12 changes: 6 additions & 6 deletions package.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
name: Latte
version: 0.1.0.0
version: 0.8.0.0
github: "githubuser/Latte"
license: BSD3
author: "Author name here"
maintainer: "example@example.com"
copyright: "2020 Author name here"
license: MIT
author: "Mateusz Gienieczko"
maintainer: "matgienieczko@gmail.com"
copyright: "2020 Mateusz Gienieczko"

extra-source-files:
- README.md
Expand All @@ -17,7 +17,7 @@ extra-source-files:
# To avoid duplicated efforts in documentation and dealing with the
# complications of embedding Haddock markup inside cabal files, it is
# common to point users to the README.md file.
description: Please see the README on GitHub at <https://github.com/githubuser/Latte#readme>
description: Please see the README on GitHub at <https://github.com/V0ldek/Latte#readme>

dependencies:
- base >= 4.7 && < 5
Expand Down
3 changes: 1 addition & 2 deletions src/ErrM.hs
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,7 @@ instance MonadFail Err where
instance Applicative Err where
pure = Ok
(Bad s) <*> _ = Bad s
(Ok f) <*> o = liftM f o

(Ok f) <*> o = f <$> o

instance Functor Err where
fmap = liftM
Expand Down
35 changes: 27 additions & 8 deletions src/Error.hs
Original file line number Diff line number Diff line change
@@ -1,14 +1,33 @@
{-# LANGUAGE FlexibleInstances #-}
module Error where

import Syntax.Abs (Pos, Positioned (..))
import Syntax.Printer (Print, printTree)
import Data.Maybe
import Syntax.Abs (Pos, Positioned (..), Unwrappable (..))
import Syntax.Code

errorMsg :: String -> Pos -> String -> String
class WithContext a where
getCtx :: a -> String

instance WithContext Code where
getCtx = codeString

instance WithContext (Maybe Code) where
getCtx c = maybe "" getCtx c

errorMsg :: Positioned a => String -> a -> String -> String
errorMsg msg a ctx = msg ++ "\n" ++ ctxMsg
where ctxMsg = lineInfo a ++ ":\n" ++ ctx
where
ctxMsg = lineInfo (pos a) ++ ":\n" ++ ctx

errorMsgMb :: Positioned a => String -> Maybe a -> Maybe String -> String
errorMsgMb msg a ctx = msg ++ "\n" ++ ctxMsg
where
ctxMsg = lineInfo (a >>= pos) ++ ":\n" ++ fromMaybe "" ctx

errorCtxMsg :: (Positioned a, Print a) => String -> a -> String
errorCtxMsg msg ctx = errorMsg msg (pos ctx) (printTree ctx)
errorCtxMsg :: (Positioned a, WithContext a, Unwrappable f) => String -> f a -> String
errorCtxMsg msg ctx = errorMsg msg (unwrap ctx) (getCtx $ unwrap ctx)

lineInfo :: Pos -> String
lineInfo (ln, ch) = "Line " ++ show ln ++ ", character " ++ show ch
lineInfo :: Maybe Pos -> String
lineInfo pos = case pos of
Nothing -> ""
Just (ln, ch) -> "Line " ++ show ln ++ ", character " ++ show ch
31 changes: 31 additions & 0 deletions src/Identifiers.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
-- Reserved identifiers used internally by the compiler.
-- Any identifier starting with '~' is meant to be invisible
-- by user code and unspeakable using lexical rules of the language.
module Identifiers where

import Syntax.Abs

-- Identifier of the class that wraps toplevel functions.
topLevelClassIdent :: Ident
topLevelClassIdent = Ident "~cl_TopLevel"

-- Internal identifier of the method being currently compiled.
currentMthdSymIdent :: Ident
currentMthdSymIdent = Ident "~mthd_current"

-- Identifiers used in for loop translation.
forArrayIdent :: Ident
forArrayIdent = Ident "~l_arr"

forIndexIdent :: Ident
forIndexIdent = Ident "~l_idx"


selfSymIdent :: Ident
selfSymIdent = Ident "self"

arrayLengthIdent :: Ident
arrayLengthIdent = Ident "length"

reservedNames :: [Ident]
reservedNames = [selfSymIdent]
Loading

0 comments on commit 7552423

Please sign in to comment.