-
Notifications
You must be signed in to change notification settings - Fork 232
Tracking and reporting source locations in SMT errors
-
Definition of ranges:
F* ranges contain two components: a def_range and a use_range. Both are triples of
(file, start_pos, end_pos)
, and eachpos
is a pair of a(line, col)
. In what follows, I'll just forget about this structure and just use ranges of the form(d,u)
, whered
is a def range andu
is a use range, both represented by single integers throughout this document. -
Ranges produced by the parser and tosyntax
Terms are tagged with ranges as they are produced by the parser, and as they are constructed, initially, every term has the same def_range and use range. I.e., all ranges produced by the parser are of the form
(d,d)
.In what follows, I will decorate F* source terms as
e@(d,u)
to denote a terme
at range(d,u)
. -
Def/use distinction during typechecking
During typechecking, when a variable is looked-up in the environment, its use range is updated. For example:
let f@(0,0) : t_f@(1,1) = e
The range information of the definition site of
f
is(0, 0)
.Now, let's say we have an occurrence of
f@(2,2)
. At that point, we look upf
from the environment, find itf@(0,0)
, and return a pair of a found occurrence off@(0,2)
and its typet_f@(1,2)
.Note, the found occurrence of
f
has its range updated to(0,2)
, indicating that this occurrence off
was defined at0
and used at2
.The type of
f
, i.e.,t_f
also has its ranges updated. In particular, we callSubst.set_use_range (2, 2) t_f
. This has the effect of updates the use range of every sub-term oft_f@(1,2)
to2
. -
Verification condition generation
The F* typechecker builds verification conditions that consist of assertions in a context. Each assertion is an AST node of the form
Tm_meta(p, Meta_labeled(msg, range, false))
.The range is the range of the term that provoked the assertion.
Continuing the example above, if
t_f@(1,2) = x:int{(pre x)@(3,2)} -> unit
and we apply
f@(0,2)
to0@(4,4)
we produce an assertion in the VC
Tm_meta ((pre 0)@(3,2), Meta_labeled("refinement subtyping", (4,4), false))
Question: Check what happens when instead of a constant
0@(4,4)
the application is to a variabley@(4,5)
with a distinct def and use range. I think we just use the range prior to def/use distinction. In fact, I suspect an invariant of theMeta_labeled
form is that its range field is always of the form(d,d)
, i.e., it is range that does not distinguish def/use.Note, the
false
is a relic from an earlier version where we used to flag the polarity of assertions (corresponding to Z3's lblpos and lblneg), but this is not longer used.Ideally, every formula in positive position should be labeled. However, we suspect that some assertions fall through the cracks and do not get labeled.
-
Query labeling
Prior to sending the query to Z3, FStar.ErrorReporting.label_goals makes a pass over the query instrumenting each assertion.
Note, I am describing this transformation in terms of core syntax terms. In reality, this phase occurs after a translation to another AST specific to the SMT encoding. But, the same
The depth-first traversal of a query AST by label goals has a notion of "enclosing context", a pair of an error message and an optional range. The initial enclosing context is just
("assertion failed", None)
When label_goals encounters a node of the form
Tm_meta (phi, Meta_labeled(msg, rng, _))
It descends into the
phi
withenclosing=(msg, Some rng)
. Note, subject to my conjectured invariant above, the range of an enclosing context, if set, does not distinguish def/use.When encountering an unsplittable goal
p
(these include literals, but also disjunctions, iff, exists etc.), we instrument them with a labell
, producingl \/ p
, associating withl
an error message and range.For example, when encountering the literal
(pre 0)@(3,2)
withenclosing=("refinement subtyping", (4,4))
),label_goals
instruments the literal with a fresh, unintepreted boolean,l
producing the formula``` l \/ pre 0 ```
And associates with the label
l
the error message from the enclosing context (e.g., "refinement subtyping")I also associates a range with the label
l
and this is computed by:a. if enclosing.range = None, then it is just the range of the goal term, e.g.,
(3,2)
in our exampleb. otherwise, if the use range of the goal term is included within the range of the enclosing context, then just include the range of goal term unchanged. One rationale for why we do this is that otherwise we would widen the range needlessly, losing precision. Note, the use_range is what appears as the primary location in the final error message.
c. Otherwise, we use the range of the literal with its def_range updated to the range of the enclosing label. i.e, in our example, we would produce
(4, 2)
, signifying that the error messages should signal2
as its primary location (i.e., the application sitef@(2,2)
) and4
as its secondary location.Case (c) is unusual, but it does come up, for instance, with user-defined labels (see section 7 below). In this case, it may well be that the use range of goal term is unrelated to the range of the enclosing context.
For splittable formulae:
-- conjunctions `p /\ q`, label goals descends on both left and right -- implications `p ==> q`, descend on the RHS only -- universal `forall x. p`, descend after opening the quantifier -- let-binding `let x = e in p` descend into `p`.
There is also some special treatment for a specific and common form of formula produced by the VC generator for testing implication among WPs. But, I'll leave that out for now.
-
Extracting and reporting errors
After label instrumentation, a query is fed to the SMT solver. In case it fails, we look for a counterexample model.
For any label variable
l
set totrue
in the counterexample, we retrieve its corresponding(message, range)
and report an Errors.Error_Z3SolverError (Error 19) for each of them.Let
q
correspond to the the range of the entire top-level term that produced the query. This is typically a large range spanning the entire top-level term.Given a
{msg=lm; rng=lr}
associated with a label, the error report is a pair{msg=msg; rng=rng}
computed as follows:-
If
lr
is(0,0)
(dummy_range), we ignore it. We setmsg=lm
andrng=q
. This leads to the commonly seen behavior of a large term highlighted as an error in the F* ide.TODO: It would be useful to instrument F* and collect all labels that get assigned a dummy_range. A run of F* with
--admit_smt_queries true
should suffice, since queries are instrumented with labels in this case anyway. Thereafter, it would be useful to triage all the occurrences of dummy_range labels and figure out how to track ranges better. -
If the file name of
lr
points to a file that is different from the file ofq
then we do not report it as the error location. Instead, we concatenate "(Also see: lm.use_range)" to thelm
as the value of themsg'
and setrng=q
.This is to make sure that we do not report a file aside from the current file as the main error location. Typically, otherwise, F* ide, which expects error locations to refer to the current buffer gets confused.
Note: This is being revised to also report the lm.def_range since it may also contain useful information.
-
Otherwise, we set
msg, rng
tolm, lr
.
Finally, given
(msg, rng)
, we report it as an error using FStar.Errors.add_errors, which prints the message, resulting in:-- `rng.use_range` as the primary error location -- and in case, `rng.def_range <> rng.use_range`, it reports "(see also rng.def_range)`
-
-
User-defined labeling
An additional complication is that users can augment ranges and labels by decorating a type
p
withPrims.labeled r msg p
. Note, the ranger
does not distinguish def/use sites (since there is no way to construct those usingPrims.mk_range
).These user-defined labeled terms are uninterpreted and receive no special treatment in the compiler until it reaches the SMT encoding (i.e., step 5).
There, any occurrence of
(labeled r msg p)@_)
is turned intoTm_meta(p, Meta_labeled(msg, r, false))@r
And is processed as usual.
Note, a user-defined label can hide other labels.
E.g., if you have
Tm_meta (labeled msg r p, Meta_labeled(msg', r', _))
This is encoded as
Tm_meta (Tm_meta(p, Meta_labeled(msg, r, _)), Meta_labeled(msg', r', _))
But, the enclosing context feature in step 5 effectively hides the outer label.