README.pod - Readme file for the PIRC compiler.
PIRC is a fresh implementation of the PIR language using Bison and Flex. Its main features are:
thread-safety, so it is reentrant.
strength reduction, implemented in the parser.
constant folding, implemented in the parser.
checking for proper use of op arguments in PIR syntax (disallowing, e.g.: $S0 = print)
allowing multiple heredocs in subroutine invocations (like: foo(<<'A', <<'B', <<'C') )
providing register usage optimization
To compile PIRC on windows using MSVC:
nmake
Running PIRC requires the shared library libparrot
; an easy way to do this is to copy libparrot.dll
in the Parrot root directory to compilers/pirc/src
.
Running PIRC is as easy as:
pirc test.pir
See 'pirc -h' for help.
The Makefile should work fine on Linux:
cd compilers/pirc && make
PIRC needs the shared library libparrot
; in order to let PIRC find it, set the path as follows:
export LD_LIBRARY_PATH=../../../blib/lib
Running is as easy as:
./pirc test.pir
The new Bison/Flex based implementation of the PIR compiler is designed as a two-stage compiler:
- 1. Heredoc preprocessor
- 2. PIR compiler
The heredoc preprocessor takes the input as written by the PIR programmer and flattens out all heredoc strings. An example is shown below to illustrate this concept:
The following input:
.sub main
$S0 = <<'EOS'
This is a heredoc string
divided
over
five
lines.
EOS
.end
is transformed into:
.sub
$S0 = "This is a heredoc string\n divided\n over\n five\n lines.\n"
.end
In order to allow an .include
d file to have heredoc strings, the heredoc preprocessor also handles the .include
directive, even though logically this is a macro function. See the discussion below for how the .include
directive works.
The PIR compiler parses the output of the heredoc preprocessor. PIRC's lexer also handles macros.
The macro layer basically implements text replacements. The following directives are handled:
.macro
.macro_const
.macro_local
.macro_label
The .include
directive takes a string argument, which is the name of a file. The contents of this file are inserted at the point where the .include
directive is written. To illustrate this, consider the following example:
main.pir:
========================
.sub main
print "hi\n"
foo()
.end
.include "lib.pir"
========================
lib.pir:
========================
.sub foo
print "foo\n"
.end
========================
This will result in the following output:
.sub main
print "hi\n"
foo()
.end
.sub foo
print "foo\n"
.end
The macro directive starts a macro definition. The macro preprocessor implements the expansion of macros. For instance, given the following input:
.macro say(msg)
print .msg
print "\n"
.endm
.sub main
.say("hi there!")
.end
will result in this output:
.sub main
print "hi there!"
print "\n"
.end
The .macro_const
directive is similar to the .macro
directive, except that a .macro_const
is just a simplified .macro
; it merely gives a name to some constant:
.macro_const PI 3.14
.sub main
print "PI is approximately: "
print .PI
print "\n"
.end
This will result in the output:
.sub main
print "PI is approximately: "
print 3.14
print "\n"
.end
As Parrot instructions are polymorphic, the PIR compiler is responsible for selecting the right variant of the instruction. The selection is based on the types of the operands. For instance:
set $I0, 42
will select the set_i_ic
instruction: the set
instruction, taking an integer (i) result operand and an integer constant (ic) operand. Other examples are:
$P0[1] = 42 --> set_p_kic_ic # kic = key integer constant
$I0 = $P0["hi"] --> set_i_p_kc # kc = key constant from constant table
$P1 = new "Hash" --> new_p_sc # sc = string constant
Expressions that can be evaluated at compile-time are pre-evaluated, saving calculations during runtime. Some constant-folding is required, as Parrot depends on this. For instance:
add $I0, 1, 2
is not a valid Parrot instruction; there is no add_i_ic_ic
instruction. Instead, this will be translated to:
set $I0, 3
which, as was explained earlier, will select the set_i_ic
instruction.
The conditional branch instructions are also pre-evaluated, if possible. For instance, consider the following statement:
if 1 < 2 goto L1
It is clear during compile time, that 1 is smaller than 2; so instead of evaluating this during runtime, we know for sure that the branch to label L1
will be made, effectively replacing the above statement with:
goto L1
Likewise, if it's clear that certain instructions don't have any effect, they can be removed altogether:
if 1 > 2 goto L1 --> noop # noop is no opcode.
$I0 = $I0 + 0 --> noop
Another type of optimization is the selection of (slightly) more efficient variants of instructions. For instance, consider the following instruction:
$I0 = $I0 + $I1
which is actually syntactic sugar for:
add $I0, $I0, $I1
The +=
operator present in many C-style languages can simpify this expression, and can also be used in PIR to the same effect. PIRC will simplify this for you. So:
add $I0, $I0, $1 # $I0 is an out operand
will be optimized, as if you had written:
add $I0, $I1 # $I0 is an in/out operand
The PIR parser can do even more improvements, if it sees opportunity to do so. Consider the following statement:
$I0 = $I0 + 1
or, in Parrot assembly syntax:
add $I0, $I0, 1
The C-like convention here is the incrementation
operator, or <$I0++>. Parrot has inc
and dec
instructions built-in as well, so that the above statement $I0 = $I0 + 1
can be optimized to:
inc $I0
The PIR compiler implements a vanilla register allocator. This means that each declared .local
or .param
symbol, and each PIR register ($Px, $Sx, $Ix, $Nx) is assigned a unique PASM register, that is associated with the original symbol or PIR register throughout the subroutine.
PIRC has a register optimizer, which can optimize the register usage. Run PIRC with the -r
option to activate this. The register optimizer is implemented using a Linear Scan Register allocator.
The implementation of the vanilla register allocator is done in the PIR symbol management module (pirsymbol.c
).
PIRC has a register optimizer, which uses a Linear Scan Register algorithm. For each symbolic register, a live-interval object is created, which has an start and end point, indicating the first and last usage of that symbolic register in the sub. The register optimizer figures out when symbolic registers don't overlap, in which case they can use the same register (assuming they're of the same type).
Bytecode generation is done, but there is the occasional bug. These are reported in trac.parrot.org.
The PIRC compiler source tree has a number of subdirectories:
- doc - contains documentation.
- heredoc - contains the implementation of the heredoc preprocessor. This is now integrated with pirc/src. It now only has a driver program to build a stand-alone heredoc preprocessor.
- src - contains the Bison/Flex implementation of PIRC
- t - for tests. Tests input is fed into Parrot after compilation, which will run the code.
- macro - contains the old implementation of the macro preprocessor. This is now integrated with pirc/src. These files are kept as a reference until the macro preprocessor in pirc/src is completed.
If you want to make changes to the lexer of parser files, you will need the Flex and/or Bison programs. There are ports available for Windows, but I don't know whether they're any good. I use Cygwin's tools.
The heredoc preprocessor is implemented in hdocprep.l
, and can be regenerated using:
cd compilers/pirc/src
flex hdocprep.l
PIRC's normal lexer is implemented in pir.l
, and can be regenerated using:
cd compilers/pirc/src
flex pir.l
The parser is implemented in pir.y
, and can be regenerated using:
cd compilers/pirc/src
bison pir.y
The file pir.l
from which the lexer is generated is not processable by Cygwin's default version of Flex as of January 2011. In order to make a reentrant lexer, a newer version is needed, which can be downloaded from the link below.
Just do:
$ ./configure
$ make
Then make sure to overwrite the supplied flex binary.
Having a look at this implementation would be greatly appreciated, and any resulting feedback even more :-). Please post bug reports in trac.parrot.org.
See also:
languages/PIR
for a PGE based implementation.compilers/imcc
in the Parrot source tree, the current standard PIR implementation.docs/imcc/syntax.pod
in the Parrot source tree for a description of PIR syntax.docs/imcc/
in the Parrot source tree for more documentation about the PIR language.docs/pdds/pdd19_pir.pod
in the Parrot source tree for the PIR design document.