This repository has been archived by the owner on Apr 10, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
/
backgnd_mercury.tex
621 lines (561 loc) · 22.4 KB
/
backgnd_mercury.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
\status{This section is complete.
}
Mercury is a pure logic/functional programming language
intended for the creation of large, fast, reliable programs.
Although the syntax of Mercury is based on the syntax of Prolog,
semantically the two languages are different
due to Mercury's purity and its type, mode, determinism and module systems.
Mercury programs consist of modules,
each of which has a separate namespace.
Individual modules are compiled separately.
Each module contains predicates and functions.
Functions are syntactic sugar for predicates with an extra argument for the
their result.
In the remainder of this dissertation we will use the word predicate to
refer to either a predicate or a function.
A predicate or function $P$ is defined in terms of a goal $G$:
%\vspace{-1\baselineskip}
% \begin{figure}[htb]
$$
\begin{array}{lll}
~P
& :~ p(X_1, \ldots, X_n)~\leftarrow~G
& \hbox{predicates} \\
% & |~ f(X_1, \ldots, X_n)=X_{n+1}~\leftarrow~G
% & \hbox{functions} \\
\end{array}
$$
% \paul{Peter suggested using a more grammar-like description of the language.
% I think that his motivation was that terminals and non-terminals should look
% more distinct.
% I have capitalised the variable names since that's closer to mercury syntax,
% it is not the same as showing terminals and non-terminals differently but
% perhaps it is clearer.}
% This has to float otherwise there's too much vertical whitespace on the
% first page.
\begin{figure}
\[
\begin{array}{lll}
G
& :~ X = Y ~|~ X = f(Y_1,~\ldots,~Y_n)
& \hbox{unifications}\\
& |~ p(X_1,~\ldots,~X_n)
& \hbox{predicate calls} \\
% & |~ X_{n+1} = f(X_1,~\ldots,~X_n)
% & \hbox{function calls} \\
& |~ X_0(X_1,~\ldots,~X_n)
& \hbox{higher order calls} \\
& |~ m(X_1,~\ldots,~X_n)
& \hbox{method calls} \\
& |~ \hbox{foreign}(p,
[X_1:~Y_1,~\ldots,X_n:~Y_n],
\hbox{\emph{foreign code}}),
& \hbox{foreign code} \\
& |~ (G_1,~\ldots,~G_n)
& \hbox{sequential conjunctions}\\
& |~ (G_1~\&~\ldots~\&~G_n)
& \hbox{parallel conjunctions}\\
& |~ (G_1 ; \ldots ; G_n)
& \hbox{disjunctions}\\
& |~ \hbox{switch}~X~(f_1:~G_1~;~\ldots~,f_n:~G_n)
& \hbox{switches}\\
& |~ (if~G_{cond}~then~G_{then}~else~G_{else})
& \hbox{if-then-elses}\\
& |~ not~G
& \hbox{negations}\\
& |~ some~[X_1,\ldots,X_n]~G
& \hbox{existential quantification}\\
% & |~ all~[X_1,\ldots,X_n]~G
% & \hbox{universal quantification} \\
& |~ promise\_pure~G
& \hbox{purity promise}\\
& |~ promise\_semipure~G
& \hbox{purity promise}\\
\end{array}
\]
\caption{The abstract syntax of Mercury}
\label{fig:abstractsyntax}
\end{figure}
\noindent
Mercury's goal types are shown in Figure~\ref{fig:abstractsyntax}.
The atomic goals are unifications
(which the compiler breaks down until each one contains
at most one function symbol),
plain first-order calls,
higher-order calls,
typeclass method calls (similar to higher-order calls),
and calls to predicates defined by code in a foreign language (usually C).
% \peter{You only need to cover enough about Mercury for the purposes of
% your thesis. I do not think how inlining of foreign code works is important.}
% \paul{We use it to implement primitives on which our transformations are
% built,
% I may remove it later if the detail of how this is done is unimportant.}
The compiler may inline foreign code definitions when generating code
for a Mercury predicate.
To allow this, the representation of a foreign code construct includes
not just the name of the predicate being called,
but also the foreign code that is the predicate's definition
and the mapping from the Mercury variables that are the call's arguments
to the names of the variables that stand for them in the foreign code.
The composite goals include
sequential and parallel conjunctions,
disjunctions, if-then-elses, negations and existential quantifications.
Section~\ref{sec:backgnd_merpar} contains a detailed description of parallel
conjunctions.
The abstract syntax does not include universal quantifications:
they are allowed at the source level,
but are transformed into a combination of two negations and an existential
quantification
(Equation~\ref{eqn:expand_univ_quant}).
Universal quantifications do not appear in the compiler's internal goal
representation.
Similarly,
a negation can be simplified using the constants $true$ and $false$,
and an if-then-else (Equation~\ref{eqn:expand_neg}).
However the compiler does not make this transformation and
negations may appear in the compiler's goal representation.
Nevertheless, we will not discuss negations in this dissertation,
since our algorithms' handling of negations is equivalent
to their handling of the below if-then-else.
\paul{Bring back negation.}
\begin{align}
all~[X_1,~\ldots,~X_n]~G~&\rightarrow~not~(~some~[X_1,~\ldots,~X_n]~(~not~G~))
\label{eqn:expand_univ_quant}
\\
not~G~&\rightarrow~(if~G~then~false~else~true)
\label{eqn:expand_neg}
\end{align}
A switch is a disjunction in which
each disjunct unifies the same bound variable
with a different function symbol.
Switches in Mercury are thus analogous to switches in languages like C.
If the switch contains a case for each function symbol in the
switched-on value's type, then the switch is said to be \emph{complete}.
Otherwise the switch is \emph{incomplete}.
Note that the representation of switches in Figure~\ref{fig:abstractsyntax}
is not the same as their source code representation;
Their source code syntax is the same as a disjunction.
See page~\pageref{page:purity} for a description of the purity promises.
\label{superhomogeneous}
Note that because a unification goal contains at most one function symbol
then any unifications that would normally have more than one function symbol
are expanded into a conjunction of smaller unifications.
This also happens to any unifications that appear as arguments in calls.
A goal for a single expression such as the quadratic equation:
\begin{verbatim}
X = (-B + sqrt(B^2 - 4*A*C)) / (2*A)
\end{verbatim}
\noindent
Will be expanded into 12 conjoined goals:
\begin{verbatim}
V_1 = 2, V_2 = B ^ V_1, V_3 = 4,
V_4 = V_3 * A, V_5 = V_4 * C, V_6 = V_2 - V_5,
V_7 = sqrt(V_6), V_8 = -1, V_9 = V_8 * B,
V_10 = V_9 + V_7, V_11 = V_1 * A, X = V_10 / V_11
\end{verbatim}
\noindent
As shown here,
constants such as integers must be introduced as unifications between a
variable and a function symbol representing the constant.
Goals flattened in this way are said to be in
\emph{super homogeneous form}.
Mercury has a type system inspired by the type system of Hope~\citep{hope}
and is similar to Haskell's type system~\citep{haskell98};
it uses Hindley-Milner~\citep{hindley69:types,milner78:types} type
inference.
Mercury programs are statically typed; the compiler knows the type of every
argument of every predicate (from declarations or inference) and every local
variable (from inference).
Types can be parametric,
type parameters are written in uppercase and
concrete types are written in lower case.
As an example, the maybe type is defined as:
\begin{verbatim}
:- type maybe(T)
---> yes(T)
; no.
\end{verbatim}
\noindent
A value of this type is a discriminated union;
which is either \code{yes(T)}, indicating the presence of some other value
whose type is \code{T};
or simply \code{no}.
The maybe type is used to describe a value that may or may not be present.
The type \code{maybe(int)} indicates that an integer value might be
available.
The list type uses a special syntax for nil and cons and is recursive:
\begin{verbatim}
:- type list(T)
---> [T | list(T)] % cons
; []. % nil
\end{verbatim}
% Float this so that it does not straddle a page boundary.
\begin{figure}
\begin{verbatim}
:- pred append(list(T), list(T), list(T)).
append([], Ys, Ys).
append([X | Xs], Ys, [X | Zs]) :-
append(Xs, Ys, Zs).
\end{verbatim}
\caption{\code{append/3}'s definition and type signature}
\label{fig:append_type_and_defn}
\end{figure}
\noindent
\code{append/3}'s type signature and definition are shown in
Figure~\ref{fig:append_type_and_defn}.
Mercury also has a strong mode system.
At any given point in the program a variable has an \emph{instantiation
state},
this is one of free, ground or clobbered.
Free variables have no value,
ground variables have a fixed value,
clobbered variables once had a value, but it is no longer available.
Partial instantiation can also occur,
but the compiler does not fully support partial instantiation
and therefore it is rarely used.
Variables that are ground may also be described as unique,
meaning they are not aliased with any other value.
When variables are used, their instantiation state may change;
such a change is described by the initial and final instantiation states
separated by the \code{>>} symbol.
The instantiation states are the same if nothing changes.
Variables can only become \emph{more instantiated}:
the change `free to ground' is legal, but `ground to free' is not.
Similarly, uniqueness cannot be added to a value that is already
bound or ground.
Commonly used modes such as input (\code{in}), output (\code{out}),
destructive input (\code{di}) and unique output (\code{uo}) are
defined in the standard library:
\begin{verbatim}
:- mode in == ground >> ground.
:- mode out == free >> ground.
:- mode di == unique >> clobbered.
:- mode uo == free >> unique.
\end{verbatim}
\noindent
A predicate may have multiple modes.
Here are two mode declarations for the \code{append/3} predicate above;
these are usually written immediately after the type declaration.
\begin{verbatim}
:- mode append(in, in, out) is det.
:- mode append(out, out, in) is multi.
\end{verbatim}
\noindent
When a predicate has a single mode,
the mode declaration may be combined with the type signature as a single
declaration such as the following declaration for \code{length/2},
the length of a list:
\begin{verbatim}
:- pred length(list(T)::in, int::out) is det.
\end{verbatim}
\noindent
The Mercury compiler generates separate code
for each mode of a predicate or function,
which we call a \emph{procedure}.
In fact, the compiler handles individual procedures as separate entities
after mode checking.
Mode correct conjunctions,
and other conjoined goals such as the condition and then parts of an
if-then-else,
must produce values for variables before consuming them.
This means that
variables passed as input arguments must be ground before and after the
call and
variables passed as output arguments must be free before the call and
ground after the call.
Similarly, a variable passed in a destructive input argument will be destroyed
before the end of the call;
it must be unique before the call and clobbered afterwords.
Referencing a clobbered variable is illegal.
A variable passed in a unique output argument must initially be free and
will become unique\footnote{
The \di and \uo modes are often used together to allow the compiler to
destructively update data structures such as arrays.}.
Unifications can also modify the instantiation state of a variable,
Unifying a free term with a ground term will make the first term ground,
and the second term non-unique.
Unifying a ground term with a ground term is an equality test,
it does not affect instantiation.
Unifying two free terms is illegal\footnote{
Aliasing free variables is illegal,
Mercury does not support logic variables.
This is a deliberate design decision;
it makes unification in Mercury much
faster than unification in Prolog.}.
The instantiation state of each variable is known at each point
within a procedure,
therefore
the compiler knows exactly where each variable becomes ground.
The mode checking pass will minimally reorder conjuncts
(in both sequential and parallel conjunctions)
so that multiple procedures created from the same predicate can be mode
correct.
% Mode invariants
The mode system enforces three invariants:
\begin{description}
\item[Conjunction invariant:]
In any set of conjoined goals,
including not just conjuncts in conjunctions
but also the condition and then-part of an if-then-else,
each variable that is consumed by any of the goals
is produced by exactly one earlier goal.
The else part of an if-then-else is not conjoined with the condition part
so this invariant does not apply to the condition and else parts.
\item[Branched goal invariant:]
In disjunctions, switches or if-then-elses,
the goal types that contain alternative branches of execution,
each branch of execution must produce
the exact same set of variables
that are consumed from outside the branched goal,
with one exception:
a branch of execution that cannot succeed (see determinisms below)
may produce a subset of this set of variables.
\item[Negated goal invariant:]
A negated goal may not bind
any variable that is visible to goals outside it,
and the condition of an if-then-else may not bind a variable
that is visible anywhere except in
the then-part of that if-then-else.
\end{description}
\noindent
Each procedure and goal has a determinism,
which puts upper and lower bounds on the number of the procedure's possible
solutions
(in the absence of infinite loops and exceptions).
Mercury's determinisms are:
\begin{description}
\item[\ddet] procedures succeed exactly once
(upper bound is one, lower bound is one).
\item[\dsemidet] procedures succeed at most once
(upper bound is one, no lower bound).
\item[\dmulti] procedures succeed at least once
(lower bound is one, no upper bound).
\item[\dnondet] procedures may succeed any number of times
(no bound of either kind).
\item[\dfailure] procedures can never succeed
(upper bound is zero, no lower bound).
\item[\derroneous] procedures have an upper bound of zero and a lower
bound of one, which means they can neither succeed nor fail.
They must either throw an exception or loop forever.
% We use the cc detisms a little bit, since parallelism may be used in
% cc_multi predicates.
\item[\dccmulti] procedures may have more than one solution, like
\dmulti,
but they commit to the first solution.
Operationally, they succeed exactly once.
\item[\dccnondet] procedures may have any number of solutions, like
\dnondet,
but they commit to the first solution.
Operationally, they succeed at most once.
\end{description}
\noindent
Examples of determinism declarations can be seen above in the declarations for
\code{append/3} and \code{length/2}.
In practice, most parts of most Mercury programs are deterministic (\ddet).
Each procedure's mode declaration
typically includes its determinism (see the mode declarations for
\code{append/3} above).
If this is omitted, the compiler can infer the missing information.
% Switch detection.
\begin{figure}
\parbox{0.5\textwidth}{
$$
\begin{array}{ll}
(\\
& X~=~f_1(\ldots),~G_1 \\
; \\
& X~=~f_2(\ldots),~G_2 \\
; \\
& X~=~f_n(\ldots),~G_n \\
)
\end{array}
$$}%
\parbox{0.5\textwidth}{
$$
\begin{array}{ll}
\hbox{switch}~x~( \\
f_1: \\
& X~=~f_1(\ldots),~G_1 \\
; \\
f_2: \\
& X~=~f_2(\ldots),~G_2 \\
; \\
f_n: \\
& X~=~f_n(\ldots),~G_n \\
)
\end{array}
$$}
\caption{Switch detection example}
\label{fig:switch_detect}
\end{figure}
Before the compiler attempts to check or infer
the determinism of each procedure,
it runs a switch detection algorithm that looks for disjunctions
in which each disjunct unifies the same input variable
(a variable that is already bound when the disjunction is entered)
with any number of different function symbols.
Figure~\ref{fig:switch_detect} shows an example.
When the same variable is used with multiple levels of unification in a
disjunction switch detection will rely on
a form of common subexpression elimination
to generate nested switches.
The point of this is that it allows determinism analysis
to infer much tighter bounds on the number of solutions of the goal.
For example, if each of the $G_i$ is deterministic
(i.e. it has determinism \ddet)
and the various $f_i$ comprise all the function symbols in $X$'s type,
then the switch can be inferred to be deterministic as well,
whereas a disjunction that is \emph{not} a switch and produces at least one
variable
cannot be deterministic,
since each disjunct may generate a solution.
Disjunctions that are not converted into switches can be classified into two
separate types of disjunction.
If a disjunction produces one or more variables
then it may have more than one solution
(it will be \dnondet, \dmulti, \dccnondet or \dccmulti).
If a disjunction does not produce any variables
then it has at most one solution
(it will be either \dsemidet or \ddet).
Even when there are different execution paths that may succeed,
the extra execution paths do not affect the program's semantics.
In practice disjunctions that produce no values are \dsemidet
(if they where \ddet the programmer would have not written them)
therefore we call them \emph{semidet disjunctions}.
The compiler will generate different code for them,
since they never produce more than one solution.
Mercury has a module system.
Calls may be qualified by the name of the module
that defines the predicate or function being called,
with the module qualifier and the predicate or function name
separated by a dot.
The \io (Input/Output) module of the Mercury standard library
defines an abstract type called the \io state,
which represents the entire state of the world outside the program.
The \io module also defines a large set of predicates that do I/O.
These predicates all have determinism \ddet.
and besides other arguments,
they all take a pair of \io states
whose modes are respectively \di and \uo
(as discussed above, \di being shorthand for \emph{destructive input}
and \uo for \emph{unique output}).
The \code{main/2} predicate that represents the entire program
(like \code{main()} in C)
also has two arguments, a \code{di,uo} pair of \io states.
A program is thus given
a unique reference to the initial state of the world,
every I/O operation conceptually destroys the current state of the world
and returns a unique reference to the new state of the world,
and the program must return the final state of the world
as the output of \code{main/2}.
Thus, the program (\code{main/2}) defines a relationship between the state
of the world before the program was executed,
and the state of the world when the program terminates.
These conditions guarantee that at each point in the execution
there is exactly one current state of the world.
% Float this so that it does not straddle a page boundary
% XXX: Subfigure.
\begin{figure}
\begin{center}
\begin{minipage}[b]{0.49\textwidth}
\subfigure[Normal notation]{%
\label{fig:hello_normal}
\begin{tabular}{l}
\code{:- pred main(io::di, io::uo) is det.} \\
\\
\code{main(S0, S) :-} \\
\code{~~~~io.write\_string("Hello ", S0, S1),} \\
\code{~~~~io.write\_string("world{\textbackslash}n", S1, S).}
\end{tabular}}
\end{minipage}
%
\begin{minipage}[b]{0.49\textwidth}
\subfigure[State variable notation]{%
\label{fig:hello_sv}
\begin{tabular}{l}
\code{:- pred main(io::di, io::uo) is det.} \\
\\
\code{main(!IO) :-} \\
\code{~~~~io.write\_string("Hello ", !IO),} \\
\code{~~~~io.write\_string("world{\textbackslash}n", !IO).}
\end{tabular}}
\end{minipage}
\end{center}
\caption{Hello world and I/O example}
\label{fig:hello}
\end{figure}
As an example, a version of ``Hello world'' is shown in
Figure~\ref{fig:hello_normal}.
\code{S0} is the initial state of the world,
\code{S1} is the state of the world after printing ``\code{Hello }'',
and \code{S} is the state of the world
after printing ``\code{world\n}'' as well.
Note that the difference e.g.\ between \code{S1} and \code{S}
represents not just the printing of ``\code{world\n}'',
but also all the changes made to the state of the world
by \emph{other} processes since the creation of \code{S1}.
This threaded state sequentialises I/O operations:
``\code{world\n}'' cannot be printed until the value for \code{S1}
is available.
%\peter{This is not relevant to your work, unless you are intent on
% including code with state variables in your discussion. I think
% it is a complication better left out.}
Numbering each version of the state of the world
(or any other state that a program may pass around) is cumbersome.
Mercury has syntactic sugar to avoid the need for this,
but this sugar does not affect
the compiler's internal representation of the program.
This syntactic sugar is shown below, along with the convention of naming
the I/O state \code{!IO}.
The compiler will transform the example in Figure~\ref{fig:hello_sv} into the
example in Figure~\ref{fig:hello_normal}.
Any term beginning with a bang (\code{!})
will be expanded into a pair of automatically named
variables.
The variable names created sequence conjunctions from left to right in
conjunctions when used with pairs of \code{di/uo} or \code{in/out}
modes.
%\peter{End of stuff I think you should leave out.}
\label{page:purity}
Mercury divides goals into three purity categories:
\begin{description}
\item[pure goals] have no side effects
and their outputs do not depend on side effects;
\item[semipure goals] have no side effects
but their outputs may depend on other side effects;
\item[impure goals] may have side effects, and may produce outputs
that depend on other side effects.
\end{description}
\noindent
Semipure and impure predicates and functions
have to be declared as such,
and calls to them must be prefaced with either
\code{impure} or \code{semipure} (whichever is appropriate).
The vast majority of Mercury code is pure,
with impure and semipure code confined to a few places
where it is used to implement pure interfaces.
(For example, the implementations of the all-solutions predicates
in the Mercury standard library use impure and semipure code.)
Programmers put
\code{promise\_pure} or \code{promise\_semipure} wrappers around a goal
to promise to the compiler (and to human readers) that
the goal as a whole is pure or semipure respectively,
even though some of the code inside the goal may be less pure.
The compiler keeps a lot of information associated with each goal,
whether atomic or not.
This includes:
\begin{itemize}
\item
the set of variables bound (or \emph{produced}) by the goal;
\item
the determinism of the goal.
\item
the purity of the goal
\item
the \emph{nonlocal set} of the goal,
which means the set of variables
that occur both inside the goal and outside it; and
\end{itemize}
\noindent
The language reference manual \citep{mercury_refman} contains a complete
description of Mercury.