-
Notifications
You must be signed in to change notification settings - Fork 232
Compiling tactics
Summary of a discussion between NS, Guido and JP on 05/05.
This is more of an old design document, to see how to compile tactics in practice go here
A reminder on OCaml file extensions.
- .cmi interface binary description
- .cmo bytecode object file
- .cma byte archive (several .cmo)
- .cmx native object
- .cmxa native archive (several .cmx and .o)
- .cmxs dynamically-loadable native object file (see https://caml.inria.fr/pub/docs/manual-ocaml/libref/Dynlink.html)
fstar.exe
contains several .cmx
's: FStar_Tactic.cmx
, FStar_Util.cmx
, NativeTactics.cmx
.
To compile a tactic, one needs to:
- extract it into
mytactic.ml
- using
ocamlopt -shared -I ...
, compile it against the.cmx
's,.mli
's and.cmi
's above to obtainmytactic.cmxs
The steps above may be automated by fstar.exe
itself, but we can assume that they are performed manually for now.
Then, the resulting .cmxs
shall be loaded into fstar.exe
. Again, this may be performed automatically, but we may assume as a temporary stopgap that the user calls fstar.exe -load mytactic.cmxs
; the tactic references several modules such as FStar_Tactic
; the dynamic loader resolves this against the corresponding .cmx
already present in fstar.exe
and everyone is happy.
Naturally, one should not screw things up, and the cmx
s used for the build of mytactic.cmxs
shall be exactly those linked into fstar.exe
One thing to note is that the Dynlink
module does not provide a "lookup function by name" feature; instead, we suggest the following NativeTactics.mli
:
val register_prim_step: string -> int -> (list term -> option term) -> unit
val register_tactic: string -> int -> (list term -> tactic term) -> unit
val find: string -> (list term -> option term) option
Then, NativeTactics.cmx
is linked into fstar.exe
, and at tactic-extraction time, fstar.exe
appends to the generated mytactic.ml
a few calls to NativeTactics.register_...
.
Consider a parameterized tactic val f: nat -> Tac term
. When the user writes by_tactic (f 2)
, a type-checking time, a VC is generated which contains a tactic sub-term of the form Tm_app (Tm_fvar "f", Tm_constant "2")
. F* then:
- deeply embeds the environment as a
term
- calls the normalize to reduce the term above applied to the environment
- gets back a list of sub-goals to process, along with an updated environment which is de-embedded.
The gist of the issue is that the natively-compiled f
present in mytactic.cmxs
has type val f: Prims.nat -> (env -> goal -> (env * goal list) option)
(roughly), but the F* typechecker manipulates term
s, meaning that one has to de-embed and re-embed when crossing the boundary between the F* typechecker and the compiled code.
We want, at tactic-extraction time, to generate the embeddings and de-embeddings. For instance:
... contents of mytactic.ml ... definition of f ...
NativeTactics.register "f" (embed (Arrow Nat, ...) f
^^^^^^^^^
a "dyn" type that describes the type of `f`, written
out by F*'s extraction
This doesn't work, however, because i) register would have to use a lot of Obj.magic's (no way to refine the type of f
based on the dynamic type and ii) it doesn't work for the function case because we need to call back into the normalizer.
Instead, we symbolically evaluate the embed
function and generate specialized bits of code for each f
and its type.
We assume f
fits within the Hindley-Milner type system (prenex polymorphism). Then,
embed_fun (dembed_a: term -> a) (embed_b: b -> term) (f: a -> b): term -> term =
fun x ->
embed_b (f (dembed_a x))
embed_nat (n: nat): term =
Tm_constant (string_of_nat n)
embed_bool (b: bool) =
...
embed_var (x: term): term = x
dembed_fun (embed_a: a -> term) (dembed_b: term -> b) (f: term) =
fun (x: a) ->
dembed_b (normalize (Tm_app (f, embed_a)))
dembed_nat (n: term): nat =
match n with Tm_constant (Const_int n) -> int_of_string n | _ -> assert false
dembed_bool (b: term): bool =
...
dembed_var (x: term): term = x
The "var" case is the most interesting. It corresponds to a tactic that has type, say:
val g: #a -> a -> (a -> bool) -> Tac term
a
is a type variable; we don't know how to embed/dembed it! But, by parametricity, g
ought to be able to deal with any such a
(remember that g
is in the Hindley-Milner subset, this wouldn't work with dependent types, as g
may refine the a
based on the value of some other parameter). We pick a = term
which generates the identity embedding/de-embedding.
We chatted about having more sophisticated versions of these, but this seems overkill and it looks like we're ok with g
not having fancy types.
As a consequence of previous sections, most tactics may be compiled natively, but some of them (not expressible in Hindley-Milner) are not expected to be extractable. As such, we need to consider to call seamlessly (interpreted or native) tactics from native tactics. The proposal is to have for every tactics mytactic
(interpreted or not) :
- an ocaml symbol
mytactic
containing either the definition of the tactic (if it can be extracted) or a call into the interpreter with the relevant tactic (if it can only be interpreted) - an F* symbol
mytactic
corresponding either the reflection ofreified_mytactic
(if it can be extracted) or the original definition (if it can only be interpreted) - an F* symbol
reified_mytactic
which is either assumed and correspond to the ocaml code if it can be extracted, or the reification ofmytactic
if it can only be interpreted (not sure that this last case have any use)
[Another solution which has been proposed is to call into the interpreter at every tactic calls in a native tactic. This seems to be slightly wasteful in term of resources and more importantly not that evident to implement (need to hack a special case of elimination of reify
when doing normalization of reification)]
- That the parametricity argument is solid.
- There is a "semantics" recursion loop, because the semantics of the embedded language, due to the presence of
normalize
, "call back" into the semantics of the original language. - Use EMF*, write the EMF* interpreter into EMF*, then reduce the interpreter applied to an embedded term... this is similar to JIT compilation. Anything in that area?
- Typed self-interpreters, but we're not really interested in the well-typedness of it, just like semantics preservation...