Code optimization runs on compiled (ML) code. The compiled code is inspected for sequences of instructions
which can be removed or replaced by functionally equivalent, but shorter and/or faster sequence of instructions. The
new sequence might even be longer than the original one, if it is executing faster (see the
goal
option).
The information on compiler optimizations is a bit technical. It might be useful if you're trying to better understand how Mindcode generates the ML code.
The compiler sometimes creates temporary variables whose only function is to store output value of an instruction
before passing it somewhere else. This optimization removes all assignments from temporary variables that carry over
the output value of the preceding instruction. The set
instruction is removed, while the preceding instruction is
updated to replace the temporary variable with the target variable used in the set statement.
The optimization is performed only when the following conditions are met:
- The
set
instruction assigns from a temporary variable. - The temporary variable is used in exactly one other instruction. The other instruction immediately precedes the instruction producing the temporary variable.
- All arguments of the other instruction referencing the temporary variable are output ones.
push
and pop
instructions are ignored by the above algorithm. push
/pop
instructions of any eliminated variables
are removed by the Stack Optimization down the line.
Case expressions allocate temporary variable to hold the value of the input expression, even if the input expression is actually a user defined variable. This optimization detects these instances, removes the temporary variable and replaces it with the original variable containing the value of the input expression. The set instruction is removed, while the other instructions are updated to replace the temporary variable with the one used in the set statement.
The optimization is performed only when the following conditions are met:
- The set instruction assigns to a case-expression temporary variable.
- The set instruction is the first of all those using the temporary variable (the check is based on absolute instruction sequence in the program, not on the actual program flow).
- Each subsequent instruction using the temporary variable conforms to the code generated by the compiler (i.e. has the form of
jump target <condition> __astX testValue
)
push
and pop
instructions are ignored by the above algorithm. push
/pop
instructions of any eliminated variables
are removed by the stack usage optimization down the line.
This optimization inspects the entire code and removes all instructions that write to variables, if none of the variables written to are actually read anywhere in the code.
This optimization support basic
and advanced
levels of optimization. On the advanced
level,
the optimization removes all dead assignments, even assignments to unused global and main variables.
Dead Code Elimination also inspects your code and prints out suspicious variables:
- Unused variables: those are the variables that were, or could be, eliminated. On
basic
level, some unused variables might not be reported. - Uninitialized variables: those are global variables that are read by the program, but never written to. (The Data Flow Optimization detects uninitialized local and function variables.)
Both cases deserve a closer inspection, as they might be a result of a typo in a variable name.
This optimization handles conditional jumps whose condition can be fully evaluated:
- always false conditional jumps are removed,
- always true conditional jumps are converted to unconditional ones.
A condition can be fully evaluated if both of its operands are literals, or if they're variables whose values were determined to be constant by the Data Flow Optimization.
The first case reduces the code size and speeds up execution. The second one in itself improves neither size nor speed, but allows those jumps to be handled by other optimizations aimed at unconditional jumps.
Conditional jumps are sometimes compiled into an op
instruction evaluating a boolean expression, and a conditional
jump acting on the value of the expression.
This optimization turns the following sequence of instructions:
op <comparison> var A B
jump label equal/notEqual var false
into
jump label <inverse of comparison>/<comparison> A B
Prerequisites:
jump
is anequal
/notEqual
comparison tofalse
,var
is a temporary variable,var
is not used anywhere in the program except these two instructions,<comparison>
has an inverse/<comparison>
exists as a condition.
This optimizer simplifies the following sequences of jump that are a result of certain combinations of optimizations:
- A conditional or unconditional jump targeting the next instruction (such a jump is obviously unnecessary).
- A conditional or unconditional jump identical to the next instruction (unconditional jumps would be also eliminated by Unreachable Code Elimination, but that isn't true for conditional jumps).
- On the
advanced
level, theend
andjump 0 always
instructions at the very end of the program are removed, as the processor will jump to the first instruction of the program upon reaching the end of the instruction list anyway, saving one instruction.
This optimization looks for certain expressions that can be performed more efficiently. Currently, the following optimizations are available:
floor
function called on a multiplication by a constant or a division. Combines the two operations into one integer division (idiv
) operation. In the case of multiplication, the constant operand is inverted to become the divisor in theidiv
operation.sensor var @this @x
andsensor var @this @y
are replaced byset var @thisx
andset var @thisy
respectively. Data Flow Optimization can then apply constant propagation to the@thisx
/@thisy
built-in constants.- Multiplication by literal zero is replaced by a
set
instruction setting the target variable to a zero. - Multiplication/division by one and addition/subtraction of zero are replaced by a
set
instruction setting the target variable to the other operand. - All set instructions assigning a variable to itself (e.g.
set x x
) are removed.
If the optimization level is advanced
, the following additional expressions are handled:
- If the
@constant
in asensor var @constant @id
instruction is a known item, liquid, block or unit constant, the Mindustry's ID of the objects is looked up and the instruction is replaced byset var <id>
, where ID is a numeric literal. Mindcode contains a list of known Mindustry objects and their IDs, obtained by using Mindustry Metadata Extraction extension on the latest Mindustry version. These IDs are expected to be immutable, but the optimization can be turned off by setting the optimization level tobasic
if the actual IDs turn out to be different from the IDs known to Mindcode.
This optimization consists of three types of modifications performed on blocks of code created by if/ternary expressions. All possible optimizations are done independently.
The value of ternary expressions and if expressions is sometimes assigned to a user-defined variable. In these situations, the true and false branches of the if/ternary expression assign the value to a temporary variable, which is then assigned to the user variable. This optimization detects these situations and when possible, assigns the final value to the user variable directly in the true/false branches:
abs = if x < 0 then
negative += 1; // The semicolon cannot be omitted here as it to separates the two statements
-x;
else
positive += 1;
x;
end;
produces this code:
jump 4 greaterThanEq x 0
op add negative negative 1
op mul abs x -1
jump 6 always 0 0
op add positive positive 1
set abs x
end
As the example demonstrates, value propagation works on more than just the set
instruction. All instructions
producing a single value are handled, specifically:
set
op
read
sensor
packcolor
Some conditional expressions can be rearranged to save instructions while keeping execution time unchanged:
print(x < 0 ? "negative" : "positive");
Without If Expression Optimization, the produced code is
jump 3 greaterThanEq x 0
set __tmp1 "negative"
jump 4 always 0 0
set __tmp1 "positive"
print __tmp1
Execution speed:
- x is negative: 4 instructions (0, 1, 2, 4) are executed,
- x is positive: 3 instructions (0, 3, 4) are executed.
The If Expression Optimization turns the code into this:
set __tmp1 "positive"
jump 3 greaterThanEq x 0
set __tmp1 "negative"
print __tmp1
Execution speed:
- x is negative: 4 instructions (0, 1, 2, 3) are executed,
- x is positive: 3 instructions (0, 1, 3) are executed.
The execution time is the same. However, one less instruction is generated.
The forward assignment optimization can be done if at least one of the branches consist of just one instruction, and both branches produce a value which is then used. Depending on the type of condition and the branch sizes, either true branch or false branch can get eliminated this way. Average execution time remains the same, although in some cases the number of executed instructions per branch can change by one (total number of instructions executed by both branches remains the same).
The instruction generator always generates true branch first. In some cases, the jump condition cannot be expressed
as a single jump and requires additional instruction (this only happens with the strict equality operator ===
,
which doesn't have an opposite operator in Mindustry Logic).
The additional instruction can be avoided when the true and false branches in the code are swapped. When this optimizer detects such a situation, it does exactly that:
if @unit.dead === 0 then
print("alive");
else
print("dead");
end;
Notice the print "dead"
occurs before print "alive"
now:
sensor __tmp0 @unit @dead
jump 4 strictEqual __tmp0 0
print "dead"
jump 0 always 0 0
print "alive"
end
The elsif
statements are equivalent to nesting the elsif part in the else
branch of the outer expression.
Optimizations of these nested statements work as expected:
y = if x < 0 then
"negative";
elsif x > 0 then
"positive";
else
"zero";
end;
produces
set y "negative"
jump 5 lessThan x 0
set y "zero"
jump 5 lessThanEq x 0
set y "positive"
end
saving three instructions over the code without if statement optimization:
jump 3 greaterThanEq x 0
set __tmp1 "negative"
jump 8 always 0 0
jump 6 lessThanEq x 0
set __tmp3 "positive"
jump 7 always 0 0
set __tmp3 "zero"
set __tmp1 __tmp3
set y __tmp1
end
This optimization inspects the actual data flow in the program and removes instructions and variables (both user defined and temporary) that are dispensable or have no effect on the program execution. Each individual optimization performed is described separately below.
Data Flow Optimizations can have profound effect on the resulting code. User-defined variables can get completely eliminated, and variables in expressions can get replaced by various other variables that were determined to hold the same value. The goal of these replacements is to allow elimination of some instructions. The optimizer doesn't try to avoid variable replacements that do not lead to instruction elimination - this would make the resulting code more understandable, but the optimizer would have to be more complex and therefore more prone to errors.
Note
One of Mindcode goals is to facilitate making small changes to the compiled code, allowing users to change
crucial parameters in the compiled code without a need to recompile entire program. To this end, assignments to
program parameters are never removed. Any changes to set
instructions in the compiled code assigning value to program parameters are fully supported and the resulting
program is fully functional, as if the value was assigned to the program parameter in the source code itself.
All other variables, including main variables and local variables, can be completely removed by this optimization. Even if they stay in the compiled code, changing the value assigned to a main variable may not produce the same effect as compiling the program with the changed value. In other words, changing a value assigned to main variable in the compiled code may break the compiled program.
If you wish to create a parametrized code, follow these rules for best results:
- Use program parameters as placeholders for the parametrized values.
- Declare the parameters at the very beginning of the program (this way the parameters will be easy to find in the compiled code).
- If you sanitize or otherwise modify the parameter value before being used by the program, store the results of these operations in another variable. Program parameters are read-only in Mindcode and cannot be modified once set.
- Do not modify instructions other than
set
instructions assigning values to program parameters in the compiled code.
On basic and advanced optimization levels, global variables are handled in the same way as program parameters. On the experimental level, global variables are fully optimized in a way similar to main or local variables. Always use program parameters for program parametrization.
On basic
optimization level, the Data Flow Optimization preserves last values assigned to main variables (except
main variables that served as a loop control variable in at least one unrolled loop). This protection is employed to
allow trying out snippets of code in the web application (which runs on basic
optimization level by default)
without the optimizer eliminating most or all of the compiled code due to it not having any effect on the Mindustry
world.
#set optimization = basic;
x0 = 0.001;
y0 = 0.002;
x1 = x0 * x0 + y0 * y0;
y1 = 2 * x0 * y0;
produces
set x0 0.001
set y0 0.002
op add x1 0.000001 0.000004
op mul y1 0.002 0.002
end
On advanced
optimization level, no special protection to main variables is awarded, and they can be completely
removed from the resulting code:
#set optimization = advanced;
x0 = 0.001;
y0 = 0.002;
x1 = x0 * x0 + y0 * y0;
y1 = 2 * x0 * y0;
produces
end
All the assignment were removed as they wouldn't have any effect on the Mindustry world when run in actual logic processor.
The data flow analysis reveals cases where variables might not be properly initialized, i.e. situations where a value of a variable is read before it is known that some value has been written to the variable. Warnings are generated for each uninitialized variable found.
Since Mindustry Logic executes the code repeatedly while preserving variable values, not initializing a variable
might be a valid choice, relying on the fact that all variables are assigned a value of null
by Mindustry at the
beginning. If you intentionally leave a variable uninitialized, you may just ignore the warning. To avoid the
warning, move the entire code into an infinite loop and initialize the variables before that loop. For example:
count += 1;
print(count);
printflush(message1);
can be rewritten as
count = 0;
while true do
count += 1;
print(count);
printflush(message1);
end;
Data Flow Optimization assumes that values assigned to uninitialized variables might be reused on the next program
execution. Assignments to uninitialized variables before calling the end()
function are therefore protected, while
assignments to initialized variables aren't - they will be overwritten on the next program execution anyway:
foo = rand(10);
if initialized == 0 then
print("Initializing...");
// Do some initialization
initialized = 1;
foo = 1;
end();
end;
print("Doing actual work");
print(initialized);
print(foo);
produces this code:
op rand foo 10 0
jump 5 notEqual initialized 0
print "Initializing..."
set initialized 1
end
print "Doing actual work"
print initialized
print foo
end
Notice the initialized = 1
statement is preserved, while foo = 1
is not.
This protection is also applied to assignment to uninitialized variables made before calling a user function which,
directly or indirectly, calls or may call the end()
function:
print(foo);
foo = 5;
bar();
foo = 6;
bar();
def bar()
end();
end;
preserves both assignments to foo
:
print foo
set foo 5
jump 6 always 0 0
set foo 6
jump 6 always 0 0
end
end
end
See also end()
function.
All assignments to variables (except global variables) are inspected and unnecessary assignments are removed. The assignment is unnecessary if the variable is not read after being assigned, or if it is not read before another assignment to the variable is made:
a = rand(10);
a = rand(20);
print(a);
a = rand(30);
compiles to:
op rand a 20 0
print a
op rand a 30 0
end
The first assignment to a
is removed, because a
is not read before another value is assigned to it. The last
assignment to a
is preserved on basic
optimization level, but would be removed on
advanced
level, because a
is not read after that assignment at all.
An assignment can also become unnecessary due to other optimizations carried by this optimizer.
When a variable is used in an instruction and the value of the variable is known to be a constant value, the variable itself is replaced by the constant value. This can in turn make the original assignment unnecessary. See for example:
#set optimization = advanced;
a = 10;
b = 20;
c = @tick + b;
printf("$a, $b, $c.");
produces
op add c @tick 20
print "10, 20, "
print c
print "."
end
Constant propagation described above ensures that constant values are used instead of variables where possible. When
a deterministic operation is performed on constant values (such as addition by the op add
instruction), constant
folding evaluates the expression and replaces the operation with the resulting value, eliminating an instruction.
For example:
#set optimization = advanced;
a = 10;
b = 20;
c = a + b;
printf("$a, $b, $c.");
produces
print "10, 20, 30."
end
Looks quite spectacular, doesn't it? Here's what happened:
- The optimizer figured out that variables
a
andb
are not needed, because they only hold a constant value. - Then it found out the
c = a + b
expression has a constant value too. - What was left was a sequence of print statements, each printing a constant value.
Print Merging optimization on
advanced
level then merged it all together.
Not every opportunity for constant folding is detected at this moment. While x = 1 + y + 2
is optimized to
op add x y 3
, x = 1 + y + z + 2
it too complex to process as this moment and the constant values of 1
and 2
won't be folded at compile time.
If the result of a constant expression doesn't have a valid mlog representation, the optimization is not performed. In other cases, loss of precision might occur.
The Data Flow Optimizer keeps track of expressions that have been evaluated. When the same expression is encountered for a second (third, fourth, ...) time, the result of the last computation is reused instead of evaluating the expression again. In the following code:
a = rand(10);
b = a + 1;
c = 2 * (a + 1);
d = 3 * (a + 1);
print(a, b, c, d);
the optimizer notices that the value a + 1
was assigned to b
after it was computed for the first time
and reuses it in the subsequent instructions:
op rand a 10 0
op add b a 1
op mul c 2 b
op mul d 3 b
print a
print b
print c
print d
end
Again, not every possible opportunity is used. Instructions are not rearranged, for example, even if doing so would allow more evaluations to be reused.
On the other hand, entire complex expressions are reused if they're identical. In the following code
a = rand(10);
b = rand(10);
x = 1 + sqrt(a * a + b * b);
y = 2 + sqrt(a * a + b * b);
print(x, y);
the entire square root is evaluated only once:
op rand a 10 0
op rand b 10 0
op mul __tmp2 a a
op mul __tmp3 b b
op add __tmp4 __tmp2 __tmp3
op sqrt __tmp5 __tmp4 0
op add x 1 __tmp5
op add y 2 __tmp5
print x
print y
end
Variables and expressions passed as arguments to inline functions, as well as return values of inline functions, are processed in the same way as other local variables. Using an inlined function therefore doesn't incur any overhead at all in Mindcode.
Data flow analysis, with some restrictions, is also applied to stackless and recursive function calls. Assignments to global variables inside stackless and recursive functions are tracked and properly handled. Optimizations are applied to function arguments and return values. (This optimization has completely replaced earlier Function call optimization and Return value optimization - all optimizations that could be performed by those optimizations - and some that couldn't - are performed by Data Flow Optimization now.)
Data Flow Optimization gathers information about variables and the values they attain in the program. Apart from using this information for its own optimizations, it also provides it to the other optimizers. The other optimizers can then improve their own optimizations further (e.g. to determine that a conditional jump is always true or always false and act accordingly). Data Flow Optimization is therefore crucial for best performance of some other optimizers as well. Some optimizations, such as Loop Unrolling, might outright require the Data Flow Optimization to be active for their own work.
Loop hoisting is an optimization which tries to identify loop invariant code (i.e. code inside loops which executes identically in each loop iteration) and moves it in front of the loop. This way the code is executed only once, instead of on each loop iteration.
Fo example, in the following code
A = 10;
for i = 0; i < A; i += 1 do
print(2 * A);
end;
the evaluation of 2 * A
is moved in front of the loop in the compiled code (changes highlighted):
0 set A 10
1 set i 0
+ 2 op mul __tmp1 2 A
3 jump 0 greaterThanEq 0 A
- op mul __tmp1 2 A
4 print __tmp1
5 op add i i 1
- jump 3 lessThan i A
+ 6 jump 4 lessThan i A
7 end
A loop condition is processed as well as a loop body, and invariant code in nested loops is hoisted all the way to the top when possible:
A = 10;
for j in 0 ... A do
i = 0;
while i < A + 10 do
i = i + 1;
print(i);
end;
end;
compiles into
0 set A 10
1 set j 0
+ 2 op add __tmp1 A 10
3 jump 0 greaterThanEq 0 A
4 set i 0
- op add __tmp1 A 10
- jump 10 greaterThanEq 0 __tmp1
+ 5 jump 9 greaterThanEq 0 __tmp1
6 op add i i 1
7 print i
- op add __tmp1 A 10
8 jump 6 lessThan i __tmp1
9 op add j j 1
- jump 3 lessThan j A
+ 10 jump 4 lessThan j A
11 end
On advanced
optimization level, Loop Hoisting is capable of handling some if
expressions as well:
#set loop-hoisting = advanced;
A = 10;
for i in 1 ... A do;
k = (A % 2 == 0) ? "Even" : "Odd";
print(i, k);
end;
print("end");
produces
0 set A 10
1 set i 1
- jump 11 greaterThanEq 1 A
2 set k "Odd"
3 op mod __tmp1 A 2
- jump 7 notEqual __tmp1 0
+ 4 jump 6 notEqual __tmp1 0
5 set k "Even"
+ 6 jump 11 greaterThanEq 1 A
7 print i
8 print k
9 op add i i 1
- jump 3 lessThan i A
+ 10 jump 7 lessThan i A
11 print "end"
12 end
At this moment, the following limitations apply:
- If the loop contains a stackless or recursive function call, all global variables are marked as loop dependent and expressions based on them aren't hoisted, since the compiler must assume the value of the global variable could be changed inside a function.
if
expressions are hoisted only when part of simple expressions. Specifically, when theif
expression is nested in a function call (such asprint(A < 0 ? "positive" : "negative")
), it won't be optimized.
The loop optimization improves loops with the condition at the beginning by performing these modifications:
- If the loop jump condition is invertible, the unconditional jump at the end of the loop to the loop condition is
replaced by a conditional jump with inverted loop condition targeting the first instruction of the loop body. This
doesn't affect the number of instructions, but executes one less instruction per loop.
- If the loop condition isn't invertible (that is, the jump condition is
===
), the optimization isn't done, since the saved jump would be spent on inverting the condition, and the code size would increase for no benefit at all.
- If the loop condition isn't invertible (that is, the jump condition is
- If the previous optimization was done and the loop condition is known to be true before the first iteration of the loop, the optimizer removes the jump at the front of the loop. The Loop Optimizer utilizes information gathered by Data Flow Optimization to evaluate the initial loop condition.
- Loop conditions that are complex expressions spanning several instructions can still be replicated at the
end of the loop, if the code generation goal is set to
speed
(the default setting at the moment). As a result, the code size might actually increase after performing this kind of optimization. See optimization for speed for details on performing this kind of optimizations.
The result of the first two optimizations in the list can be seen here:
param LIMIT = 10;
for i in 0 ... LIMIT do
cell1[i] = 1;
end;
print("Done.");
produces
set LIMIT 10
set i 0
jump 6 greaterThanEq 0 LIMIT
write 1 cell1 i
op add i i 1
jump 3 lessThan i LIMIT
print "Done."
end
Executing the entire loop (including the i
variable initialization) takes 32 steps. Without optimization, the loop
would require 43 steps. That's quite significant difference, especially for small loops.
The third modification is demonstrated here:
while switch1.enabled and switch2.enabled do
print("Doing something.");
end;
print("A switch has been reset.");
which produces:
sensor __tmp0 switch1 @enabled
sensor __tmp1 switch2 @enabled
op land __tmp2 __tmp0 __tmp1
jump 9 equal __tmp2 false
print "Doing something."
sensor __tmp0 switch1 @enabled
sensor __tmp1 switch2 @enabled
op land __tmp2 __tmp0 __tmp1
jump 4 notEqual __tmp2 false
print "A switch has been reset."
end
Loop unrolling is a speed optimization, and as such is only active when the
goal
option is set to speed
or auto
. Furthermore, loop unrolling depends
on the Data Flow optimization and isn't functional when Data Flow Optimization is not
active.
Loop unrolling optimization works by replacing loops whose number of iterations can be determined by the compiler with a linear sequence of instructions. This results in speedup of program execution, since the jump instruction representing a condition which terminates the loop, and oftentimes also instruction(s) that change the loop control variable value, can be removed from the unrolled loop and only instructions actually performing the intended work of the loop remain. The optimization is most efficient on loops that are very "tight" - contain very little instructions apart from the loop itself. The most dramatic practical example is probably something like this:
#set loop-unrolling = none;
for i in 0 ... 10 do
cell1[i] = 0;
end;
This code clears the first ten slots of a memory cell. Without loop unrolling, the code would look like this:
set i 0
write 0 cell1 i
op add i i 1
jump 2 lessThan i 10
It takes 31 instruction executions to perform this code. With loop unrolling, though, the code is changed to this:
write 0 cell1 0
write 0 cell1 1
write 0 cell1 2
write 0 cell1 3
write 0 cell1 4
write 0 cell1 5
write 0 cell1 6
write 0 cell1 7
write 0 cell1 8
write 0 cell1 9
The size of the loop is now 10 instructions instead of 4, but it takes just these 10 instructions to execute, instead of the 31 in the previous case, executing three times as fast!
The price for this speedup is the increased number of instructions themselves. Since there's a hard limit of 1000 instructions in a Mindustry Logic program, loops with large number of iterations obviously cannot be unrolled. See speed optimization for an explanation of how Mindcode decides whether to unroll a loop.
Apart from removing the superfluous instructions, loop unrolling also replaces variables with constant values. This can make further optimizations opportunities arise, especially for a Data Flow Optimizer and possibly for others. A not particularly practical, but nonetheless striking example is this program which computes the sum of numbers from 0 to 100:
sum = 0;
for i in 0 .. 100;
sum += i;
end;
print(sum);
which compiles to (drumroll, please):
print 5050
end
What happened here is this: the loop was unrolled to individual instructions in this basic form:
set sum 0
set i 0
op add sum sum i
op add i i 1
op add sum sum i
op add i i 1
...
Data Flow Optimization then evaluated all those expressions using the combination of constant propagation and constant folding, all the way into the final sum of 5050, which is then directly printed.
A list iteration loop can be always unrolled, if there's enough instruction space left.
For other loops, unrolling can generally be performed when Mindcode can determine the loop has a certain fixed number of iterations and can infer other properties of the loop, such as a variable that controls the loop iterations. A loop should be eligible for the unrolling when the following conditions are met:
- The loop is controlled by a single, local or main variable: the loop condition must consist of a variable which is modified inside a loop, and a constant or an effectively constant variable. Loops based on global variables cannot be unrolled.
- The loop control variable is modified inside the loop only by
op
instructions which have the loop control variable as a first argument and a constant or an effectively constant variable as a second argument; the op instructions must be deterministic. Any other instruction that sets the value of loop control variable precludes loop unrolling. - All modifications of the loop control variable happen directly in the loop body: the variable must not be modified in a nested loop or in an if statement, for example.
- The loop has a nonzero number of iterations. The upper limit of the number of iterations depends on available instruction space, but generally can never exceed 1000 iterations.
Furthermore:
- If the optimization level is
basic
, the loop control variable can only be modified byadd
andsub
operations. Every value the loop control variable attains in individual iterations must be an integer (it means that the starting value, ending value and every change of the iteration variable must be an integer as well). The loop condition must be expressed using one of these operators:>
,<
,>=
or<=
. In this mode, the total number of iterations is computed using the starting and ending value of the variable and the change in each iteration (resulting in a fast computation). - If the optimization level is
advanced
, every deterministic update of loop control variable by a constant value and every form of loop condition is allowed. In this case, Mindcode determines the total number of iterations by emulating the entire execution of the loop, until the loop exits or the maximum possible number of iterations allowed by available instruction space is reached (meaning the loop cannot be unrolled). break
andcontinue
statements are supported. However, when using abreak
statement, the entire loop is still unrolled. Possible superfluous code after a break statement might be removed by other optimizers.
Examples of loops that can be unrolled on basic
optimization level:
// Basic case
for i in 0 .. 10 do
cell1[i] = cell1[i + 1];
end;
// Two separate increments of the loop control variable
j = 0;
while j < 20 do
println(j);
j += 1;
println(j);
j += 2;
end;
// The loop control variable can be used in further expressions
for k in 0 ... 10 do
cell1[k] = cell1[2 * k];
end;
for l in 0 ... 10 do
println(l % 2 ? "Odd" : "Even");
end;
// Loop inside an inline function: can be unrolled if a constant value is passed into the size parameter
inline def copy(src, dst, size)
for i in 0 ... size do
dst[i] = src[i];
end;
end;
// This will produce a loop that can be unrolled
copy(cell1, cell2, 10);
// This produces a loop that CANNOT be unrolled: SIZE is not a constant value
param SIZE = 10;
copy(cell1, cell2, SIZE);
// Some loops containing expressions in the condition can still be unrolled,
// but it strongly depends on the structure of the expression
i = 0;
while i += 1 < 10 do
print(i);
end ;
Examples of loops that can be unrolled on advanced
optimization level:
// An operation different from add and sub is supported
for i = 1; i < 100000; i <<= 1 do
println(i);
end;
// This loop can be unrolled, because it terminates
// (if the condition was j > 0, it would never terminate)
j = 10;
while j > 1 do
j += 1;
println(i);
j \= 2;
println(i);
end;
// This loop is unrolled, but the number of iterations is 11!
// The code produces the same output as if it wasn't unrolled.
// This is because of rounding errors when evaluating floating-point expressions
for k = 0; k < 1; k += 0.1 do
println(k);
end;
Examples of loops that cannot be unrolled:
// LIMIT is a global variable and as such the value assigned to it isn't considered constant
// (see Data Flow Optimization)
LIMIT = 10;
for i in 0 ... LIMIT do
cell1[i] = 0;
end;
// The loop control variable is changed inside an if
i = 0;
while i < 10 do
i += 1;
print(i);
if i % 5 == 0 then
i += 1;
print("Five");
end;
end;
// There isn't a single loop control variable - loop condition depens on both i and j
for i = 0, j = 10; i < j; i += 1, j -= 1 do
print(i);
end;
// The expression changing the loop control variable is too complex.
// (Rewriting the assignment to i *= 2; i += 1; would allow unrolling)
i = 0;
while i < 1000 do
i = 2 * i + 1;
print(i);
end;
// This loop won't be unrolled. We know it ends after 5 iterations due to the break statement,
// but Mindcode assumes 2000 iterations, always reaching the instruction limit.
for i in 0 ... 2000 do
if i > 5 break; end;
print(i);
end;
Nested loops can also be unrolled, and the optimizer prefers unrolling the inner loop:
k = 0;
for i in 0 ... 100 do
for j in 0 ... 100 do
k = k + rand(j); // Prevents collapsing the loop by Data Flow Optimization
end;
end;
Both loops are eligible for unrolling at the beginning, and the inner one is chosen. After that the outer loop can no longer be unrolled, because the instruction limit would be hit.
Sometimes unrolling an outer loop can make the inner loop eligible for unrolling too. In this case, the inner loop cannot be unrolled first, as it is not constant:
#set optimization = advanced;
first = true;
for i in 1 .. 5 do
for j in i .. 5 do
if first then
first = false;
else
print(" ");
end;
print(10 * i + j);
end;
end;
In this example, the outer loop is unrolled first, after which each copy of the inner loop can be unrolled independently. Further optimizations (including Print Merging) then compact the entire computation into a single print instruction:
print "11 12 13 14 15 22 23 24 25 33 34 35 44 45 55"
end
Function Inlining is a speed optimization, and as such is only active when the
goal
option is set to speed
or auto
.
Function inlining converts out-of-line function calls into inline function calls. This conversion alone saves a few instructions: storing the return address, jumping to the function body and jumping back at the original address. However, many additional optimizations might be available once a function is inlined, especially if the inlined function call is using constant argument values. In such a situation many other powerful optimizations, such as constant folding or loop unrolling, may become available.
User defined, non-recursive function which is called just once in the entire program, is automatically inlined and
this cannot be prevented: such code is always both faster and smaller. It is also possible to declare individual
functions using the inline
keyword, forcing all calls of such functions to be inlined.
This optimization can inline additional functions that aren't recursive and also aren't declared inline
in the
source code. If there's enough instruction space, all function calls may be inlined and the original function body
removed from the program.
When the optimization level is set to advanced
and there isn't enough instruction space, only a single one or
several specific function calls may be inlined; in such case the original function body remains in the program and
is used by the function calls that weren't inlined. If there are only last two function calls remaining, either both
of them, or none of them, will be inlined.
It is therefore no longer necessary to use the inline
keyword, except in cases when Mindcode's automatic inlining
chooses function different from the one(s) you prefer to be inlined.
Case Switching is a speed optimization, and as such is only active when the
goal
option is set to speed
or auto
.
Case expressions are normally compiled to a sequence of conditional jumps: for each when
branch the entry
condition(s) of that clause is evaluated; when it is false
, the control is transferred to the next when
branch,
and eventually to the else
branch or end of the expression. This means the case expression evaluates - on average -
half of all existing conditions, assuming even distribution of the input values of the case expression. (If
some input values of the case expressions are more frequent, it is possible to achieve better average execution
times by placing those values first.)
The Case Switching optimization improves case expressions which branch on integer values of the expression.
Warning
It is assumed that a case statement branching exclusively on integer values always get integer value on input as well. If the input value of the case expression may take on non-integer values, this optimization will produce wrong code. At this moment Mindcode isn't able to recognize such situation; if this is the case, you need to disable the Case Switching optimization manually.
The sequence of conditional jumps in such statements is replaced by a jump table. A jump table facilitates
direct jumps to the corresponding when
branch. The actual instructions used to build a jump table are
jump <else branch address> lessThan value minimal_when_value
jump <else branch address> greaterThan value maximal_when_value
op add @counter <start_of_jump_table - minimal_when_value> value
start_of_jump_table:
jump <when branch for minimal when value address>
jump <when branch for minimal when value + 1 address>
jump <when branch for minimal when value + 2 address>
...
jump <when branch for maximal when value address>
The jump table is put in front of the when
branches. Original conditions in front of each processed when
branch
are removed. Each when
branch jumps to the end of the case expression as usual.
To build the jump table, the minimum and maximum value of existing when
branches are determined first. Values
outside this range are handled by the else
branch (if there isn't an explicit else
branch in the case statement,
the else
branch just jumps to the end of the case expression). Values inside this range are mapped to a particular
when
branch, or, if the value doesn't correspond to any of the when
branches, to the else
branch.
The first two instructions in the example above (jump lessThan
, jump greaterThan
) handle the cases where the
input value lies outside the range supported by the jump table. The op add @counter
instruction then transfers
the control to the corresponding specific jump in the jump table and consequently to the proper when
branch.
The jump table executes at most four instructions on each case expression execution (less if the input value lies outside the supported range). We've mentioned above that the original case statement executes half of the conditional jumps on average. This means that converting the case expression to a jump table only makes sense when there's more than 8 conditional jumps in the case expression.
Notes:
- If you put the more frequent values first in the case expression, and the value distribution is very skewed, converting the case expression to the jump table might actually worsen the average execution time. Mindcode has no way to figure this on its own; if you encounter this situation, you might need to disable the Case Switching optimization for your program.
- If the input value of the case expression could be determined by Mindcode to already lie in a specific range,
it would be possible to avoid the
jump lessThan
and/orjump greaterThan
instructions at the beginning of the jump table, as their function is to ensure the value lies in a proper range. This would provide a potentially significant additional speedup and is planned for some future version. - Currently, there's no limit on the size of the jump table. For a case expression handling values 1 to 10 and then a value of 100, the jump table would have 100 entries. This affects computation of the optimization's benefit, and might make the optimization less favorable compared to other optimizations; however if available space permits it, such a jump table would be created.
The following conditions must be met for a case expression to be processed by this optimization:
- All values used in
when
clauses must be effectively constant. - All values used in
when
clauses must be integers. - Values used in
when
clauses must be unique. - There are no ranges in the
when
clauses.
value = floor(rand(20));
text = case value
when 0 then "None";
when 1 then "One";
when 2, 3, 4 then "A few";
when 5, 6, 7, 8 then "Many";
when 10 then "Ten";
else "I don't known this number!";
end;
print(text);
The above case expression is transformed to this:
op rand __tmp0 20 0
op floor value __tmp0 0
op min __tmp3 value 11
op max __tmp3 __tmp3 -1
op add @counter __tmp3 6
jump 28 always 0 0
jump 18 always 0 0
jump 20 always 0 0
jump 22 always 0 0
jump 22 always 0 0
jump 22 always 0 0
jump 24 always 0 0
jump 24 always 0 0
jump 24 always 0 0
jump 24 always 0 0
jump 28 always 0 0
jump 26 always 0 0
jump 28 always 0 0
set __tmp2 "None"
jump 29 always 0 0
set __tmp2 "One"
jump 29 always 0 0
set __tmp2 "A few"
jump 29 always 0 0
set __tmp2 "Several"
jump 29 always 0 0
set __tmp2 "Ten"
jump 29 always 0 0
set __tmp2 "I don't known this number!"
print __tmp2
end
Return Optimization is a speed optimization, and as such is only active when the
goal
option is set to speed
or auto
.
The Return Optimization is very simple. Whenever there's an unconditional jump to the final sequence of instructions representing a return from the call (which is always three instructions long), the jump is replaced by the entire return sequence. The jump execution is avoided at the price of two additional instructions.
The impact of this optimization is probably marginal. Recursive functions are of limited use by themselves, and this optimization only applies in a rather specific contexts.
This optimization detects situations where a conditional jump skips a following, unconditional one and replaces it with a single conditional jump with a reversed condition and a target of the second jump. Example:
jump __label0 equal __tmp9 false
jump __label1
label __label0
will be turned to
jump __label1 notEqual __tmp9 false
Optimization won't be done if the condition does not have an inverse (strictEqual
).
These sequences of instructions may arise when using the break
or continue
statements:
while true do
...
if not_alive then
break;
end;
end;
If a jump (conditional or unconditional) targets an unconditional jump, the target of the first jump is redirected to the target of the second jump, repeated until the end of jump chain is reached. Moreover:
- on
advanced
level,end
instruction is handled identically tojump 0 always
, - conditional jumps in the jump chain are followed if:
- their condition is identical to the condition of the first jump, and
- the condition arguments do not contain a volatile variable (
@time
,@tick
,@counter
etc.).
- unconditional jumps targeting an indirect jump (a
set @counter
instruction, internally represented by a virtual
goto
instruction) are replaced with the indirect jump itself.
No instructions are removed or added, but the execution of the code is faster.
This optimizer removes instructions that are unreachable. There are several ways unreachable instructions might appear:
- Jump Threading can create unreachable jumps that are no longer targeted.
- User-created unreachable regions, such as
while false ... end
, or code following awhile true
loop. - User defined functions which are called from an unreachable region.
Instruction removal is done by analyzing the control flow of the program and removing instructions that are never executed. When Jump Normalization is not active, some section of unreachable code may not be recognized.
The end
instruction, even when not reachable, is not removed unless the optimization level is advanced
.
Main body program and function definitions are each terminated by the end
instruction and removing it
might make the produced code somewhat less readable.
Optimizes the stack usage -- eliminates push
/pop
instruction pairs determined to be unnecessary. The following
optimizations are performed:
push
/pop
instruction elimination for function variables that are not used anywhere apart from the push/pop instructions. This happens when variables are eliminated by other optimizations. The optimization is done globally, in a single pass across the entire program.push
/pop
instruction elimination for function variables that are read neither by any instruction between the call instruction and the end of the function, nor by any instruction which is part of the same loop as the call instruction. Implicit reads by recursive calls to the same function with the value of the parameter unchanged are also detected.push
/pop
instruction elimination for function parameters/variables that are never modified within the function.
This optimization merges together print instructions with string literal arguments.
The print instructions will get merged even if they aren't consecutive, assuming there aren't instructions
that could break the print sequence (jump
, label
or print <variable>
).
Effectively, this optimization eliminates a print
instruction by turning this
println("Items: ", items);
println("Time: " @time);
into this:
print("Items: ", items);
print("\nTime: ", @time "\n");
On advanced
level, all constant values - not just string constants - are merged. For example:
#set optimization = advanced;
const MAX_VALUE = 10;
printf("Step $i of $MAX_VALUE\n");
produces
print "Step "
print i
print " of 10\n"
On basic optimization level, the output would be:
print "Step "
print i
print " of "
print 10
print "\n"
On basic
level, the optimization won't merge print instructions if the merge would produce a string
longer than 34 characters (36 when counting the double quotes). On advanced
level, such instructions
will be merged regardless. This can create long string constants, but according to our tests these can be pasted
into Mindustry processors even if they're longer than what the Mindustry GUI allows to enter.
The print merging optimization will also merge print
instructions generated by the remark()
function. Instructions
generated by remarks are merged differently to standard print
instructions:
print
instructions generated from differentremark()
function calls are never merged. Only instructions generated from a singleremark()
are merged together.- All constant values are merged together regardless of the resulting string length, even on the
basic
optimization level.
If the print merging optimization is not active, instructions from remark()
functions aren't merged.
In some cases, it is possible to rearrange the code in such a way that, even though it consist of more instructions, it actually executes faster. An example of this kind of optimization is function call inlining: instead of having a single copy of the function called from various places, the entire code of the function can be replicated at the places it is called from. The speedup stems from the fact that passing arguments to the function, storing the return address and later returning back from the function can be avoided this way, while the code size increase is a consequence of having two or more copies of the function instead of just one.
Mindustry processors limit the total number of instructions in a program to a thousand. If the compiled code exceeds this limit, it is unusable, but if the compiled code is smaller than the limit, the unused instruction space brings no benefit at all. The best approach therefore is to use as much of the existing instruction space without exceeding the limit to generate faster code where possible.
Mindcode provides the goal
option to specify whether Mindcode should
generate code that is smaller (the goal is size
) or a code that is faster (speed
). There's a third option -
auto
- which is the default for the compiler and currently is identical to the speed
option. When the goal is
set to size
, only optimizations with a zero or negative cost are performed and no statistics about speed
optimizations is displayed.
Another setting affecting the speed optimizations is the instruction-limit
option. It allows to change the instruction limit considered by
the optimizer.
When generating the code, there might be several opportunities for speed optimizations. It is possible that by applying all of them the maximal number of instructions would be exceeded. Mindcode follows the following process to choose the optimizations that will be realized:
- At certain point in the optimization process, all opportunities for speed optimizations are analyzed and those that can be realized in the remaining instruction space are gathered.
- For each such possible optimization, the following quantities are computed:
- cost: the number of additional instructions that will be created by the optimization,
- benefit: a measure of execution improvement achieved by the optimization,
- efficiency: the efficiency of the optimization is computed by dividing the benefit by cost, obtaining a measure of program improvement per additional instruction of code.
- The optimization with the highest efficiency is applied. If several optimizations have the same efficiency, the one with the smallest cost is applied.
- The entire process is repeated from start until there are no optimization possibilities left, or none of the optimizations is permissible given available instruction space.
In short, as many as possible optimizations are applied in the order from one giving best returns to the one giving worst return. Different strategies for choosing optimizations might be possible (for example, instead of realizing one large optimization, it might be better to apply two smaller ones), but at this point these different arrangements aren't considered.
Oftentimes, an optimization being applied might open up opportunities for further optimizations (in some cases it might be possible to unroll inner loop after unrolling the outer loop, for example). Since all remaining optimizations are considered after applying the most efficient one, Mindcode will automatically discover and process these additional optimizations, in the order described above.
It's quite obvious that calculating the benefit is a key part of the optimization evaluation. Avoiding an instruction that is executed a lot provides more benefit that avoiding one that is executed just once (in Mindustry, each instruction takes the same time to execute, simplifying the things greatly). Mindcode therefore assigns each instruction a weight, a measure of how often the instruction is expected to be executed. In general, it is impossible to compute this number precisely. Mindcode uses a very simple algorithm to determine instruction weights:
- At the beginning of code generation, current weight is established:
- 1 for the main program,
- number of places a function is called from for out-of-line or recursive functions.
- Each generated instruction is assigned the current weight.
- When entering a branch of an
if
statement, the current weight is halved. It is restored when exiting the branch. This corresponds to a very simplistic expectation that the condition will be evaluated to true 50% of the time, and therefore a branch in the if statement will be executed only half as often as the surrounding code. - When entering a
when
branch of acase
expression, the weight is divided by the number of branches in the expression. The reasoning (and inaccuracy) is analogous to theif
statement approximation described above. - When entering a loop body, the weight is multiplied by 25. The implicit expectation is that a loop will iterate 25 times on average (which is certainly wrong). Even when the number of iterations is known at the compilation time, the weight adjustment is the same for all loops. This is to prevent Mindcode from preferring optimizations of loops with large, known number of iterations (such as setting all values of memory bank to 0) to other loops whose number of iterations cannot be known.
- Weight of stackless and recursive functions is adjusted:
- Stackless function weights are iteratively recomputed as a total weight of the respective
CALL
instructions of the given function. - Recursive function weights are then computed as a total weight of the respective
CALLREC
instructions of the given function. No iterative updating is made for recursive functions.
- Stackless function weights are iteratively recomputed as a total weight of the respective
- When instructions are duplicated or moved during optimizations, their weights are adjusted according to the context into which they were duplicated or moved.
The benefit of an optimization is then computed as the sum of weights of instructions that would be avoided thanks
to the optimization. The net result is that Mindcode strongly prefers optimizing code inside loops, and defers
optimizations inside the branches of if
and case
statements.
« Previous: Compiler directives | Next: Function reference for Mindustry Logic 6 »