This is a redesign and rebuild of the Parrot Compiler Toolkit (PCT), which is installed with Parrot. PCT provides a quick and easy framework to build high-level languages (HLLs) on Parrot, but has its limitations. The main goal of PACT is to provide the same ease of use as PCT, but add easier extensibility, more power, and more features. This project was sparked by benabik++'s comments during his Google Summer of Code 2011 project.
Currently the project is in the planning stages and this repository contains notes and design documents. Assistance, forks, and commentary are welcome!
PACT is written in Winxed and requires the Rosella library to build and test. However, no dependency on Rosella should be required for the compiled library.
The Makefile is just a thin wrapper around a distutils/Rosella based setup.winxed. You can give either one the following commands:
build
(the default): builds all libraries and programstest
: runs all tests in the thet/
directoryinstall
: Installs the libraries and programs in the same location as Parrotclean
: Removes all libraries, programs, and intermediate files.
This directory contains Markdown files that are design documents for PACT.
This directory includes the actual PACT library. The library is built in pieces. Each section of the library is either a single winxed file or a directory containing winxed files that are collected together into a single packfile.
This directory includes sources for all the programs created by this
project. The resulting executable is prefixed with pact_
and is based on
the filename of the source. At the moment this is just pact_disasm
which
is built from src/disasm.winxed
.
This directory includes the tests for the PACT library. Each test file
should end with .t
and output TAP. (Generally by way of the Rosella.Test
library)
- Ease of Hacking
- Similar to PCT
- Modular and Flexible
- Test-Driven Development
- Bytecode Generation
- Typed
- Optimizations
- Round-Trip Code
- Use Compiler Best Practices
PCT was written in PIR, which gives it a lot of access to the Parrot VM, but makes it very difficult to work on. To make the source easier to work on, PACT will be implemented in Winxed, which gives most of the power of direct PIR but in a far more expressive form.
There is a lot of experience in the Parrot community with PCT, and its value in developing large compilers has been proved by Rakudo. PACT does not strive for 100% compatibility, but the features and interface of PCT should be kept in mind while designing/building PACT.
While PACT will strive to be a one-size-fits-all toolkit, we must also recognize that someone will always have something that doesn't quite match what PACT does. PACT should be designed in pieces that are easy to use individually, replace, and combine in new and interesting ways. There are some uses in particular to keep in mind:
- Alternate lexing/parsing frameworks
- Including low-level code in high-level
(Think POST nodes in a PAST tree. A more structured version of inline nodes.) - Compiling portions of a program.
(Given a PAST expression, return a POST expression not a full program. Particularly useful for REPLs and eval functions.)
When working on as complicated a system as a compiler, it is very difficult to determine ahead of time what will break when you alter a section of code. A thorough testing framework will help ensure that the compilers that depend on PACT will not break due to updates.
PCT was very tied to the structure of PIR. PACT is intended to be designed with the idea of building directly to bytecode from the start. In particular, this means that no portion of the system can rely on simply generating PIR that does the right thing.
On the other hand, PIR is an extremely useful format. PBC is not stable across versions of parrot, so being able to save PIR bootstrap steps is invaluable. Other backends such as Winxed may also be useful.
The key to building sane bytecode is maintaining knowledge of what type
every value in the tree is intended to be. Because of this, every PACT
node should maintain an idea of what type it returns (even if this is
'void'). In addition, passing around information encoded into strings
makes further processing of it very difficult so all information in the
tree should be a PACT node. (For example, registers should be
PACT::Register.new('P', 2)
, not '$P2'
.)
While creating fast code can be done by hand, it is usually far more convenient to write what you mean and have the computer make it faster. While optimizing code is not required for the design of PACT, it should be simple to add and customize. Notably, the tree optimization project should be integrated very deeply into the system.
Some basic optimizations like constant folding (for non-PMC values) and dead code elimination can be implemented quickly. More complex optimizations involving SSA and data-flow analysis are not required, but the ability to perform them should be kept in mind.
Generating high quality code is difficult when the output of the compiler is opaque. PASM is a mostly dead format and PIR is often derided for both the amount of hidden work it does and its compiler IMCC. New assembly and intermediate formats are required. These formats should be as easy to generate and process as possible.
The ability to store arbitrary objects in bytecode files means that not every PBC can be regenerated faithfully from disassembly, but handling the simple cases will keep the disassembler honest and make code generation easier to test. Human-readability is key, but ease of writing is not.
Recommended reading: The Dragon Book (Compilers: Principles, Techniques, and Tools by Aho, Lam, Sethi, and Ullman)
Those who do not learn from history are doomed to repeat it. While converting to multiple intermediate steps or worrying about SSA may seem like unnecessary work, using similar phases and styles as other compilers will make it easy to use the lessons learned from them. GCC and LLVM are designed the way they are because it helps them generate fast and correct code, so emulating them is no bad thing.