-
Notifications
You must be signed in to change notification settings - Fork 232
Notes on optimizations for type inference and memoization
Notes from a discussion with Leo about type inference and memoization of type-checking
- An efficient, optimistic type-checker
When computing a solution to a meta-variable (G |- ?m : t) := e
,
Lean 3 uses a fast check to ensure that e
has type t
in context
G
.
In Lean, this fast check is called infer_type
with the property that
if infer_type G e = t
then if e
is well-typed, then it has
type t
in context G
.
For example, this means that if
e = f e1 .. en
and
f : (x1:t1) -> ... -> (xn:tn) -> t
then infer_type G e
returns t [e1..en/x1..xn]
after only inferring
the type of f
and not inspecting any of the arguments e1 .. en
.
This relies on a meta-property of the unification algorithm, which is
that any assignment to a meta-variable computed by the unification
algorithm is well-typed in the environment of the meta-variable---what
needs to be checked is that it is well-typed at the expected type of
the meta-variable and infer_type
efficiently computes a type which
can then be compared to the expected type for definitional equality.
Some examples from F*:
a.
t : int -> Type
f : #x:nat -> t x -> Type
z : t -1
====================================
f #?u z
In this context, we would solve ?u := -1
and infer_type _ -1
returns int
since - : int -> int
.
So, the fast check fails (rightfully) and one can fall back on the current slow path, which should also rightfully fail.
b.
t : int -> Type
f : #x:nat -> t x -> Type
y : int
z : t y
====================================
if (y >= 0) then f #?u z
In this context, (?u := y)
and infer_type _ y
returns int
since y : int
.
So, the fast check fails and one can fall back on the current slow path, which, this time, would succeed, correctly.
c. Consider applying a split
tactic to a goal of the form
(a /\ b) /\ (c /\ d)
where a, b, c, d
are large terms
and split: #p:Type -> #q:Type -> p -> q -> p /\ q
. This time,
we have ?p := (a /\ b)
and ?q = (c /\ d)
and infer_type _ (a /\ b)
quickly returns Type
just from the type of /\
, independently of a
and b
. Likewise for (c /\ d)
.
- Memoization of type-checking
The optimization above is not applicable in the Lean kernel, since meta-variables do not even appear in the kernel.
Consider again a goal of the form (a /\ b) /\ (c /\ d)
for large
terms a, b, c, d
.
And consider splitting the goal twice producing a proof term of the form
split #(a /\ b) #(c /\ d) (split #a #b pf_a pf_b) (split #c #d pf_c pf_d)
where `split: #p:Type -> #q:Type -> p -> q -> p /\ q`
Checking the proof term above using F*'s naive strategy is very
inefficient, since it requires type-checking the large term a
multiple times.
The Lean kernel memoizes the type-checking of terms but using a locally nameless representation of terms (like F*) where every name in a term is accompanied by its type (i.e., the sort field of a bv). This means that terms can be type-checked independently of their environment.
Note, this only works for terms that are known to be well-scoped to
begin with. So, when type-checking a term e
in a context G
, the
free variables of e
(which are memoized in the representation of
e
, as in F*) are checked to make sure they are included in G
and
then e
is type-checked independently of G
, with memoization.
See this page for more discussions on memoization.
- Representation of meta-variables used in unification
In F* currently (as in Lean 2) a meta-variable introduced in
context is introduced at a pi-type. i.e., in a context G, a
meta-variable ?m is introduced at type G -> t
and, typically,
we generate a unification problem of the form
(?m : G -> t) (x0 : t0) ... (xn : tn) =?= e
This is representation, although very general, has many downsides:
-
It is very redundant. The entire context
G
is replicated in the(x0:t0) .. (xn:tn)
and, worse, occurrences of other meta-variables of a similar shape inti
lead to a combinatorial blowup. -
Even a simple solution for
?m
produces a large lambda term, which must be checked. -
Checking for various common special cases, e.g., checking to see if (?m : G -> t) (x0 : t0) ... (xn : tn) is a pattern involves a quadratic check of the
xi
to check for no duplicates.
Instead, in Lean 3, a meta-variable introduced in a context has
the form G |- ?m : t
, i.e., a meta-variable introduced in
context G
at type t
.
a. This avoids redundancy, since the context G
is shared as
usual and no additional occurrences of the bindings in G
are
produced.
b. Simple solutions for ?m
do not introduce large lambda terms.
c. A meta-variable in the form G |- ?m : t
is always a pattern,
no check required.