Compiling tactics

Summary of a discussion between NS, Guido and JP on 05/05.

This is more of an old design document, to see how to compile tactics in practice go here

Compiling tactics

A reminder on OCaml file extensions.

.cmi interface binary description
.cmo bytecode object file
.cma byte archive (several .cmo)
.cmx native object
.cmxa native archive (several .cmx and .o)
.cmxs dynamically-loadable native object file (see https://caml.inria.fr/pub/docs/manual-ocaml/libref/Dynlink.html)

fstar.exe contains several .cmx's: FStar_Tactic.cmx, FStar_Util.cmx, NativeTactics.cmx.

To compile a tactic, one needs to:

extract it into mytactic.ml
using ocamlopt -shared -I ..., compile it against the .cmx's, .mli's and .cmi's above to obtain mytactic.cmxs

The steps above may be automated by fstar.exe itself, but we can assume that they are performed manually for now.

Then, the resulting .cmxs shall be loaded into fstar.exe. Again, this may be performed automatically, but we may assume as a temporary stopgap that the user calls fstar.exe -load mytactic.cmxs; the tactic references several modules such as FStar_Tactic; the dynamic loader resolves this against the corresponding .cmx already present in fstar.exe and everyone is happy.

Naturally, one should not screw things up, and the cmxs used for the build of mytactic.cmxs shall be exactly those linked into fstar.exe

One thing to note is that the Dynlink module does not provide a "lookup function by name" feature; instead, we suggest the following NativeTactics.mli:

val register_prim_step: string -> int -> (list term -> option term) -> unit
val register_tactic: string -> int -> (list term -> tactic term) -> unit
val find: string -> (list term -> option term) option

Then, NativeTactics.cmx is linked into fstar.exe, and at tactic-extraction time, fstar.exe appends to the generated mytactic.ml a few calls to NativeTactics.register_....

Embeddings, two-way communication between normalizer and native tactics

Consider a parameterized tactic val f: nat -> Tac term. When the user writes by_tactic (f 2), a type-checking time, a VC is generated which contains a tactic sub-term of the form Tm_app (Tm_fvar "f", Tm_constant "2"). F* then:

deeply embeds the environment as a term
calls the normalize to reduce the term above applied to the environment
gets back a list of sub-goals to process, along with an updated environment which is de-embedded.

The gist of the issue is that the natively-compiled f present in mytactic.cmxs has type val f: Prims.nat -> (env -> goal -> (env * goal list) option) (roughly), but the F* typechecker manipulates terms, meaning that one has to de-embed and re-embed when crossing the boundary between the F* typechecker and the compiled code.

Conceptually (doesn't type-check)

We want, at tactic-extraction time, to generate the embeddings and de-embeddings. For instance:

... contents of mytactic.ml ... definition of f ...
NativeTactics.register "f" (embed (Arrow Nat, ...) f
                                   ^^^^^^^^^
                                   a "dyn" type that describes the type of `f`, written
                                   out by F*'s extraction

This doesn't work, however, because i) register would have to use a lot of Obj.magic's (no way to refine the type of f based on the dynamic type and ii) it doesn't work for the function case because we need to call back into the normalizer.

Symbolically evaluating embed

Instead, we symbolically evaluate the embed function and generate specialized bits of code for each f and its type.

We assume f fits within the Hindley-Milner type system (prenex polymorphism). Then,

embed_fun (dembed_a: term -> a) (embed_b: b -> term) (f: a -> b): term -> term =
  fun x ->
    embed_b (f (dembed_a x))
embed_nat (n: nat): term =
  Tm_constant (string_of_nat n)
embed_bool (b: bool) =
  ...
embed_var (x: term): term = x

dembed_fun (embed_a: a -> term) (dembed_b: term -> b) (f: term) =
  fun (x: a) ->
    dembed_b (normalize (Tm_app (f, embed_a)))
dembed_nat (n: term): nat =
  match n with Tm_constant (Const_int n) -> int_of_string n | _ -> assert false
dembed_bool (b: term): bool =
  ...
dembed_var (x: term): term = x

The "var" case is the most interesting. It corresponds to a tactic that has type, say:

val g: #a -> a -> (a -> bool) -> Tac term

a is a type variable; we don't know how to embed/dembed it! But, by parametricity, g ought to be able to deal with any such a (remember that g is in the Hindley-Milner subset, this wouldn't work with dependent types, as g may refine the a based on the value of some other parameter). We pick a = term which generates the identity embedding/de-embedding.

More complicated type signatures for tactics

We chatted about having more sophisticated versions of these, but this seems overkill and it looks like we're ok with g not having fancy types.

Calling tactics from tactics

As a consequence of previous sections, most tactics may be compiled natively, but some of them (not expressible in Hindley-Milner) are not expected to be extractable. As such, we need to consider to call seamlessly (interpreted or native) tactics from native tactics. The proposal is to have for every tactics mytactic (interpreted or not) :

an ocaml symbol mytactic containing either the definition of the tactic (if it can be extracted) or a call into the interpreter with the relevant tactic (if it can only be interpreted)
an F* symbol mytactic corresponding either the reflection of reified_mytactic (if it can be extracted) or the original definition (if it can only be interpreted)
an F* symbol reified_mytactic which is either assumed and correspond to the ocaml code if it can be extracted, or the reification of mytactic if it can only be interpreted (not sure that this last case have any use)

[Another solution which has been proposed is to call into the interpreter at every tactic calls in a native tactic. This seems to be slightly wasteful in term of resources and more importantly not that evident to implement (need to hack a special case of elimination of reify when doing normalization of reification)]

What to prove about this

That the parametricity argument is solid.
There is a "semantics" recursion loop, because the semantics of the embedded language, due to the presence of normalize, "call back" into the semantics of the original language.
Use EMF*, write the EMF* interpreter into EMF*, then reduce the interpreter applied to an embedded term... this is similar to JIT compilation. Anything in that area?
Typed self-interpreters, but we're not really interested in the well-typedness of it, just like semantics preservation...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling tactics

Compiling tactics

Embeddings, two-way communication between normalizer and native tactics

Conceptually (doesn't type-check)

Symbolically evaluating embed

More complicated type signatures for tactics

Calling tactics from tactics

What to prove about this

Clone this wiki locally