Skip to content
/ zuse Public

Compiler for the programming language Kalkyl

Notifications You must be signed in to change notification settings

thyringer/zuse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo Kalkyl programming language

Zuse Compiler for Kalkyl

Zuse is the reference compiler for the functional programming language Kalkyl.

In the long term, Zuse is to combine the following components:

  • AOT compiler, which translates Kalkyl code into C and then uses a C compiler to further convert it into an application or lib
  • project and package manager
  • language server

Compiling Zuse

Install Deno on your system and build the executable from the root directory of the project with the following command:

deno compile --output ./builds/zuse ./compiler/Main.ts

Kalkyl Briefly

Kalkyl is a compiled programming language without garbage collection that aims to summarize developments and findings in the field of functional programming in a modern language for everyday software development, with the goal of using the simplest possible syntax to convey these features in an understandable way. In summary, Kalkyl strives

  • to be as systematic as possible, both in terms of syntax and consistency in the features, by combining the simplicity of Python with the power of Haskell;
  • to be pragmatic instead of dogmatically patronizing the programmer, which is why – although purely functional – the programer can still program in the usual imperative way with a syntax similar to Pascal but without its block verbosity;
  • to offer a few general language features that can be easily combined with each other thanks to the totally expression-oriented structure, instead of many special individual solutions with poor orthogonality;
  • to offer a dead simple project management, which is easy to understand for the programmer as well as easily processable by the compiler.

The main focus of Kalkyl's designs lies in referential transparency with only total functions, in order to not only promote the development of programs with as few as possible bugs, but also to exclude errors as far as possible through the language itself. To achieve this design goal, Kalkyl is based on

  • a declarative stateless language structure together with uniqueness and linear types to maintain referential transparency;
  • refinement types as the only means of avoiding unwanted arguments in order to enforce total functions instead of triggering errors.

These two cornerstones also make it much easier to manage dynamic memory safely without garbage collection, for which Kalkyl uses substructural logic in combination with smart pointers.

Feature Overview

  • higher, partially applicable, total functions
  • uniqueness types instead of moands for stateful programming without loss of referential transparency
  • extremely flexible generalized algebraic data types with type constructors that can depend on values
  • multiparametric type classes with functional dependencies and a flexible module system to exchange implementations
  • refinements to specify interfaces precisely, whose compliance can be ensured by the compiler
  • modern dynamic memory management, which allows Kalkyl to be used almost like a scripting language, but without having to sacrifice high performance
  • imperative language constructs that are embedded as expressions and can thus be seamlessly integrated with the rest of the language

About the Name

The name "Kalkyl" – pronounced as [kalˈkyːl] – comes from the first high-level programming language, the "Plankalkül", which Konrad Zuse designed based on the lambda calculus with all the typical features that we know and expect from programming languages ​​today, making him decades ahead of his time.

The "y" in the name is just an international transcription for the German ü sound and an allusion to Python. In addition, projects in Kalkyl are called "plans", similar to how Zuse spoke of "Rechenplänen" [calculation plans] for programs. It is therefore not surprising that Kalkyl's compiler is named after Zuse.

Explanations with Code Examples

This section provides a quick overview of Kalkyl's concepts and appearance. A detailed description of Kalkyl can be found in the docs/ folder. However, this "Language Report" is not intended as a tutorial, but rather as a light version of a specification.

Program Structure

Each Kalkyl program is composed of one or more components, which are reflected at directory level as source files and can also be called translation or compilation units:

component Geometry provides
    area,
    Shape::Circle,
    Shape::Rectangle,
    Shape

# Or simplified, if all data constructors should be public anyway:
component Geometry provides Shape::*, area

use Base.Math unqualified
# Or import only `pi` unqualified and the rest through the namespace `Math`:
use Base.Math with pi

type Shape has Circle Float64 | Rectangle Float64 Float64

area (Circle r) = pi * r**2
area (Rectangle a b) = a * b 

The definition program of the main component forms the entry point:

program = do
    echo "Hello, what’s your name?"
    name <- get
    echo ("Nice to meet you " <> name <> ".")

Unlike Haskell, there is no syntax sugar to make monadic connections appear imperative, since there is simply no need for monads to handle input/output thanks to uniqueness types:

echo : String *IO -> *IO
get  : *IO -> String *IO
(<>) : a b -> c where Concating a b
(<-) : (Var t) (x -> t x) -> (x -> x)
(>>) : (a -> b) (b -> c) -> (a -> c)

Where do is also just a function that accepts any number of arguments and links them together with >>.

Smart Constructors and Declared Tuples

Refinements allow you to specify smart data constructors:

type Even has Even {i : Int | i &mod 2 = 0} deriving coercion

x = Even 6

If only one data constructor is required, which should be named the same as the type constructor, this can be declared even more concisely:

type Even is {i : Int | i &mod 2 = 0}

In fact, all data constructors return their arguments together as a single tuple:

rec = Rectangle 12.3 45.6
a = rec.1 # a → 45.6

Furthermore, data constructors are possible that describe a tuple with field names (record):

type MailingAddress t has MailingAddress
    -str String
    -city String
    -zipcode String
    -country CountryCode

a = MailingAddress "123 Main St" "Springfield" "12345" US
# or with some labeled arguments:
b = MailingAddress "Max-Mustermann-Ring 12b" -country DE -zipcode "0123" -city "Leipzig"
c = b.country # c = DE

If arguments are specified by name, the order does not matter.

The data constructor returns a tuple whose elements can then be accessed after construction. In addition, Kalkyl is also extremely flexible when it comes to applying functions; all arguments can be passed individually (curried), as a whole in the form of a tuple, or mixed between both variants:

i = ("123 Main St", "Springfield", "12345")
a = MailingAddress i US

The passed tuple is automatically disassembled to serve the individual parameters according to the specified order or labels.

No OOP but something much cooler

In Kalkyl, data and functions are deliberately kept separate, because OOP as propagated with C++ or Java is seen as a bad idea. Nevertheless, there is a mechanism that works similarly to method calls in OOP languages, but is not limited to the first parameter, since each function can alternatively be called postfix after one or more arguments, while their types determine the concrete implementation:

use Physics

program = do
    particle1 := Physics.Electromagnetism.Particle 1.0e-6 (0, 0, 0)
    particle2 := Physics.Electromagnetism.Particle -2.0e-6 (0, 0, 1)
    write << "Distance [m]: " <> particle1&distance particle2 &display
    write << "Electric Force [N]: " <> particle1 particle2 &electric-force &display

Unlike in C, the ampersand & functions as a "postfix caller" rather than an address operator. Public functions that are referenceable through the namespaces of their parameter types can be called unqualifiedly on the argument values. Referring to the example above, the compiler then looks for suitable functions in the relevant component Physics.Electromagnetism or generally under this namespace, as it can also be subsequently extended:

component Physics.Electromagnetism provides
    Particle, distance, electric-force

use Math

type Particle is
    -charge Float64 ## Charge of the particle in Coulombs.
    -position (-x, -y, -z Float64) ## Position in meters.
deriving display, equal

## Calculates the distance between two particles.
distance : Particle Particle -> Float64
distance (Particle _ p1) (Particle _ p2) = sqrt
    ((p2.x - p1.x)**2 + (p2.y - p1.y)**2 + (p2.z - p1.z)**2)

## Calculates the electric force between two particles.
electric-force : Particle Particle -> Double
electric-force part1 part2 = let
    r = distance part1 part2
    k = 8.99 * 10**9 # Coulomb's constant in N·m^2/C^2.
in
    0 if r == 0 else k * Math.abs(part1.charge * part2.charge) / r**2

This alternative way of calling functions is therefore just a more convenient syntax to avoid unnecessary qualification through namespaces, along with the elimination of the need for parentheses. Consequently, the postfix caller allows to simulate custom verbal operators:

Math.mod 8 38 &mod 3

Fancy Naming

It should be emphasized that identifiers in Kalkyl can also contain hyphens and even plus signs +:

type FileMode has
    Read/R | Read-Write/R+ | Rewrite/W | Read-Rewrite/W+ | Append/A | Read-Append/A+

File : FileMode -> Type
overwrite-file : (*File W+) String -> (*File W+) 

The slash / simply separates an alternative shorter name, which is preferred at type level, from the first variant.

Lowercase names that begin with a hyphen will be treated as "labels" that can be used to name types and values, as already shown in the previous examples. The hyphen was chosen to avoid an inflated use of : or = and to enable neat syntax, which is a bit reminiscent of CLI program parameters.

This of course has the consequence that spaces are ALWAYS required between the binary operators + and -, which, however, can hardly be seen as a disadvantage, but rather as imposed proper code style. The only real disadvantage is the impossibility of using the minus sign as a unary prefix operator, but let's be honest: How often do you need that? Here, a negation function can serve the same purpose, or the following idiom: -1*x, which is possible since * and / are among the few operators that do not necessarily require whitespace for their operands.

Furthermore, names that bind expressions – whether with or without parameters – that evaluate to Boolean values must end with a question mark:

even? x = mod x 2 == 0

The question mark here is not an operator, but part of the word like a letter. This special notation allows such "predicates" to not only be easier to read, but also to be called directly at the type level in conditions without being confused with type variables:

f x:{n | even? n} = …

Modularity despite Ad hoc Polymorphism

For systematic overloading, Kalkyl offers type classes like in Haskell, which are called concepts, to avoid confusion with classes from OOP languages ​​and to emphasize the basic idea of ​​designing (conceptualizing) interfaces:

concept Displayable a has
    display : a -> String

preinstall module Displayable Bool has
    display True = "true"
    display False = "false"

Implementations of a concept can be globally unique and coherent, as in Haskell, which is called "pre-installing" in Kalkyl, enabling the compiler to automatically find the appropriate instance.

As long as no concrete implementation is required, concepts appear as constraints that impose conditions on type variables to guarantee that a generic function works for their values:

display : a -> String where Displayable a

It may be necessary to replace a pre-installation with another installed module locally:

slang = module Displayable Bool has
    display True = "Yo"
    display False = "Nope"

x = display True using slang # x → "Yo"

Implementations of concepts are first-class citizens in Kalkyl, so they can be defined like ordinary values ​​and processed by functions but also installed with using for a specific definition or with install for an entire component:

install slang

Alternatively, a module is directly usable without prior installation:

x = slang.display True

or instead resolved under namespaces:

use slang unter Alternatives.Slang

x = Alternatives.Slang.display True

Furthermore, parameterizations are also possible to create so-called module templates:

slang true~"Yo" false~"Nope" = module Displayable Bool has
    display True = true : {"Yo", "Yup", "Yeah", "Yes"}
    display False = false : {"Nope", "Nah", "No"}

use slang -false "Nah" under Weird

x = Weird.display False # x → "Nah"

This flexible interchangeability and applicability is ultimately the reason why implementations of concepts are called modules.

Components themselves can be specified by a concept as well, but this only serves the purpose of defining plugins for an existing application:

component KalkylSyntax : ExampleTextEditor.SyntaxHighlighting

use ExampleTextEditor

Functional Programming meets Memory Management

In Kalkyl, the programmer has free rein in memory management – if he wants. But unlike in C, the full power of a modern type system is at work in the background, supported by proven concepts from the field of functional programming, which help the programmer to commit significantly fewer memory errors:

component Memory provides
    (^_), Runtime, Size, Allocating, Releasing, Reallocating, init, copy

alias Nat+ is {n : Nat | n > 0}

(^_) : Type + Nat+ -> Type

## Can only be implemented automatically at runtime and contains the heap state for initial uniqueness.
concept Runtime

concept Allocating t | options given t <=> options has
    malloc : options -> ^t where Runtime

concept Releasing t where Allocating t has
    free : ^t -> ()

concept Reallocating a options | b where
    Allocating {a, b}, Releasing {a, b} # or: Allocating a, Allocating b, …
has
    remalloc : ^a options -> ^b where Runtime

## Guarantees that always initialized memory is allocated.
init : t~$default -> ^t where Default t, Allocating t, Runtime 

## If `-n` is `None`, the entire object gets copied, otherwise only n-bytes from source to destination.
copy : -dest ^t -src ^t -n ?Nat -> -dest ^t -src ^t

init : t -> ^t where Allocating t, Runtime

The conception of memory management functions makes it very easy to offer OS-specific implementations for them, or to replace existing implementations with customized ones.

The constraint Runtime ensures that the compiler does not consider allocating functions to be statically applicable, but rather postpones the evaluation of expressions calling them to runtime. In this sense, Runtime can be thought of as an implicit parameter for an unique argument that gets automatically passed when the program starts, thus maintaining the referential transparency; since Runtime reflects the real fact at code level that the heap state is actually only known at runtime.

High-performance Data Structures in Kalkyl itself

Tuples form the basis for defining more complex data structures:

type Node t is -data t, -next ^Node ~ Null
    # The tilde is a type operator to assign a value as default to a type: (~) : {t : Type} t -> {t}.

extend : (^Node t) t -> ^Node t where Runtime
extend node data = do
    if node /= Null then
        # Copy memory address from `node` to `current` and borrow all rights until `node` is reused:
        current <= node
        while current^.next /= Null loop current := current^.next
        current^.next := init (Node data)
        last := current^.next
        # Once `node` is used, `current` is no longer available as it may still be the original pointer:
        node
    else
        Null

Kalkyl gives the programmer direct access to imperative language constructs to define the most efficient data structure, while the compiler can exclude memory errors from the ground up thanks to its advanced type system exerting substructural logic. The remaining errors related to pointers that are undetectable at compile time get intercepted by smart pointers, which can nevertheless be optimized away for a specific compilation unit (also called components in Kalkyl). The idea is that for test versions and general applications where 100% performance is not crucial, memory related errors get noticed or recorded at runtime.

Simple Rules enable Efficiency despite Abstraction

In contrast to Rust, Kalkyl does not differentiate between "safe" and "unsafe", but basically only allows safe code, whose safety guarantees at runtime can be opted out using respective compilation settings, which is particularly appropriate for proven implementations.

Furthermore, pointers in Kalkyl serve the sole purpose of implementing data structures, but are not intended for everyday use. The programmer primarily sees ordinary (free) or unique types:

extend : (*Array t) t -> *Array t

Instead of requiring a pointer or mutable reference, the function expects – simply expressed by an asterisk – that an array must be passed that no one else references. The compiler can provide this guarantee through simple code analysis at compile time, so that this function is able to manipulate such an array safely without destroying the referential transparency of the code, since that passed array is ultimately not used anywhere else. Internally, of course, a pointer is dereferenced to update the header and data of the array; but to the user of the extend function, Array t appears as a value that just has to be unique.

This abstraction is available if there are specific implementations of the respective memory concepts. Functions such as extend, which actually expect pointers (^Array t) t -> ^Array t, get then automatically abstracted, made possible by the fact that pointers are equally unique and, thanks to type-specific implementations of free, no longer have to be used in a strictly linear manner, i.e. exactly once:

Overview of Kalkyl's substructural type system

Pointers are also referred to as steadfast types in the specification because they are unique and cannot be discarded, but must be used exactly once. However, if there is a pre-installed specific implementation for the concepts of Allocating and Releasing, objects can be created dynamically using new without ever coming into contact with pointers and their steadfast behavior:

# new : options -> *t where Allocating t options, Runtime
init-array x = Array::new x

If such a dynamically allocated object is not returned, the compiler automatically inserts calling free, which allows that object referenced by a pointer to behave like a mere unique value and thus eliminates the need for manual dereferencing, releasing and the like.

Incidentally, no further specification is required if the return type is already known from the context, thanks to type inference:

init-array : t.. -> *Array t where Runtime
init-array x = new x

Note on development and specification

The "Language Report" is not intended as a tutorial or textbook, but rather as a light version of a specification that is largely "finished". However, a few sections are still missing. The language report is also still subject to minor adjustments in general, so examples that refer to features already presented in previous chapters may still be accidentally unchanged.

For example, the lexical structure of Kalkyl was changed to be very easy to parse. Original ideas such as custom literals for paths and tags were discarded, as much more useful DSLs can be built using the curried functions in Kalkyl. For similar reasons, multi-line strings have been discarded, and instead the option of reading text files as strings at compile time is intended, which is much more flexible and at the same time keeps the program code clear. Furthermore, multi-line comments have also been discarded, as these are mostly only used to comment out code, for which Kalkyl now offers the keyword ex instead, which, when placed before declarations, causes the compiler to only check the syntax without resolving names or performing any static checks. All of these simplifications even led to Kalkyl's lexer being able to process each line independently of the others.

Similar further simplifications are to be expected in the future. However, the design of the language is 95% complete and has been thoroughly thought through; hence far-reaching syntax changes are quite unlikely, not least because no further features are planned. The first data types and concepts of the standard library have also already been fixed, with the wealth of experience from Haskell being used here in particular.

About

Compiler for the programming language Kalkyl

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published