Counterexamples As Assumptions #5013

Dargones · 2024-01-25T03:19:32Z

Problem

Right now Dafny reports counterexamples using special syntax that may be difficult to understand. For example, consider the following Dafny program that defines a standard List datatype with a function View that maps the list to a sequence of integers:

datatype Node = Cons(next:Node, value:int) | Nil {
  function View():seq<int> 
    ensures Cons? ==> |View()| > 0 && View()[0] == value && View()[1..] == next.View()
  {
    if Nil? then [] else [value] + next.View()
  }
}

Suppose we were to (incorrectly) assert that the list cannot correspond to the sequence of integers [1, 2, 3], like so:

method m(list:Node) {
  assert list.View() != [1, 2, 3];
}

Currently, Dafny would return the following counterexample:

At "method m(list:Node) {" (file.dfy:20):
    list:Problem.Node = Cons(next := @0, value := 1)
    @0:Problem.Node = Cons(next := @2, value := 2)
    @2:Problem.Node = Cons(next := @4, value := 3)
    @4:Problem.Node = Nil

This counterexample is confusing because it does not explain what the meaning of @0, @1, etc. is. The notation is different from the Dafny syntax, and it also does not capture some of the information that is actually contained within the counterexample because there is no way to express this information using this custom syntax. In particular, the counterexample constrains the value returned by calling View() on the list variable. This constrain might be redundant in this particular example but in general, we would want to capture all the information contained in the counterexample.

Solution

This PR redesigns the counterexample generation functionality so that counterexamples are represented internally and can be printed as Dafny assumptions. For example, for the program above, the counterexample will now look like this:

assume Node.Cons(Node.Cons(Node.Cons(Node.Nil, 3), 2), 1) == list 
    && Node.Cons(Node.Cons(Node.Cons(Node.Nil, 3), 2), 1).View.requires() 
    && [1, 2, 3] == Node.Cons(Node.Cons(Node.Cons(Node.Nil, 3), 2), 1).View() 
    && Node.Cons(Node.Cons(Node.Nil, 3), 2).View.requires() 
    && [2, 3] == Node.Cons(Node.Cons(Node.Nil, 3), 2).View() 
    && Node.Cons(Node.Nil, 3).View.requires() 
    && [3] == Node.Cons(Node.Nil, 3).View() 
    && Node.Nil.View.requires() 
    && [] == Node.Nil.View();

While admittedly more verbose because it explores the return values of functions, this counterexample precisely constrains the argument that leads to the assertion violation. In particular, if you were to insert this assumption into the code and revert the assertion, Dafny would verify that this counterexample is correct.

At a high level, this makes the following changes:

Represent counterexamples as Dafny statements that the user can insert directly into their code to debug the failing assertion.
Make sure the counterexamples are wellformed, i.e. the constraints are ordered in such a way that the resulting assumption verifies.
Support counterexample constraints over function return values (as an example of something that can really only be done using native Dafny syntax)
Automatically fix counterexamples that are internally inconsistent, when such an inconsistency can be easily detected. For instance, a counterexample will never describe negative indices of an array or sequence, call a set empty if it contains elements, etc. All of these are possible in the counterexample model returned by Boogie but we prune such inconsistencies before reporting the counterexample, when possible.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

Dargones · 2024-03-20T22:00:44Z

Edit: right now, one can only get a counterexample in the form of assumptions. To enable test generation behavior, the counterexample model would need to be cross-referenced with the source code, so this is something that could be left as a future PR.

keyboardDrummer · 2024-03-21T12:00:04Z

Edit: right now, one can only get a counterexample in the form of assumptions. To enable test generation behavior, the counterexample model would need to be cross-referenced with the source code, so this is something that could be left as a future PR.

I think that would be valuable, because if it can be generated, and it does cause an assertion to fail at runtime, it provides a guaranteed counter-example, instead of a counter-example guess (the model produced after an unknown result). It seems like the only way we're aware of to determine if a counter-example guess is correct.

…of invalid

Dargones · 2024-03-21T18:07:45Z

Have just pushed a commit that adds a warning to notify the user that the counterexample may not be valid due to the solver returning UNKOWN. I would prefer to leave counterexample execution for future work, since it will be a substantial change/addition to the PR. Executing a counterexample is, I think, indeed the only certain way to determine that the counterexample is valid but it is also not clear how applicable such a feature would be (a lot of Dafny code is not intended for execution and code that is intended for execution can fail verification for reasons that would not come up at runtime).

keyboardDrummer · 2024-03-22T10:46:37Z

docs/DafnyRef/UserGuide.md

+  Note that Danfy cannot guarantee that the counterexample
+  it reports provably violates the assertion or that the assumptions are not
+  mutually inconsistent (see [^smt-encoding]), so this output should be 
+  expected manually and treated as a hint.


expected => inspected

keyboardDrummer · 2024-03-22T10:47:52Z

Have just pushed a commit that adds a warning to notify the user that the counterexample may not be valid due to the solver returning UNKOWN. I would prefer to leave counterexample execution for future work, since it will be a substantial change/addition to the PR. Executing a counterexample is, I think, indeed the only certain way to determine that the counterexample is valid but it is also not clear how applicable such a feature would be (a lot of Dafny code is not intended for execution and code that is intended for execution can fail verification for reasons that would not come up at runtime).

Sounds good, thanks. I'm not sure what the best terminology is for these potential counterexamples, but we can iterate on that.

keyboardDrummer · 2024-03-22T10:49:21Z

docs/DafnyRef/UserGuide.md

-  where to write the counterexample, as well as the
-  `-proverOpt:O:model_compress=false` and
-  `-proverOpt:O:model.completion=true` options.
+* `-extractCounterexample` - if verification fails, report a possible


subjective nitpick: possible => potential

…edicates

keyboardDrummer · 2024-10-21T10:29:02Z

@Dargones This PR changes the counterexample API for dafny server. Do you know if it would be possible to get back the old API ? I think for the IDE an API that only shows either concrete values or holes, would be ideal.

I imagine an API like;

record CounterExampleItem(Position Position, IDictionary<string, Value> Variables, IDictionary<int, Value> Heap);
interface Value;
record Primitive(string content) : Value;
record Unknown() : Value;
record Reference(int number) : Value;
record StructuredValue(string description, IDictionary<string, Value> children) : Value; // description could be the name of the type, or just "object"

At some point I think counterexamples in the IDE should be shown using the debugger UI, which would mean we'd have to implement the debug adapter protocol, which has an API not too different from the one above:

https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable

Dargones · 2024-10-21T14:46:22Z

HI, @keyboardDrummer! Here is a quick PR I made that should fix this: #5847 I agree that in the long term it would be great to use the debugger UI, though this might take one/two weeks of full time work...This PR (as in the one I am leaving the comment on) does a lot of heavy lifting towards that goal, I think.

keyboardDrummer · 2024-10-21T14:54:44Z

HI, @keyboardDrummer! Here is a quick PR I made that should fix this: #5847 I agree that in the long term it would be great to use the debugger UI, though this might take one/two weeks of full time work...This PR (as in the one I am leaving the comment on) does a lot of heavy lifting towards that goal, I think.

Amazing, thanks!

Aleksandr Fedchin added 30 commits September 4, 2023 08:59

Store Progress

00b7477

minor changes

cc874dc

Counterexample parity for extension vs command line

1be2388

Update DafnyRef and dev/news

d5505f4

Update ProverLogRegression

8d428bc

Update existing test

77c4e8f

primitive types

6683cc7

Merge branch 'CounterexampleParity' into CounterexamplesAsPredicates

94db77a

Save temp changes

8c87dff

Sync with master

f38b9f1

Add Values field

f358e51

Calculate Cardinality

d3e50ec

Separate constraints into their own class

038395a

Fixes and testing

5cbe1de

Resolve all literals

acf2179

push minor fixes

af19151

Prevent inifinite recursion

0e7da20

Keep track of MapEmpty

b57b5f1

Fix arities

726c164

Modify definition selection

b8426f0

Another small fix

4ae99f6

Fix test

1e2e991

Update wellformedness rules

b4eb8a9

Change order of definitions

0284161

Pass Test Generation tests

d021fa2

Do not constrain elements of a sequence that are out of bounds

fe7d9aa

Merge remote-tracking branch 'origin/master' into CounterexampleParity

83cc425

Enable counterexamples for multiple locations in the program

d182a13

Temprorarily add boogie submodule

65d2c04

Fix test-generation translation pass and Update tests

00fa2a5

Add a warning indicating that the counterexample may be inconsistent …

5975ea7

…of invalid

Dargones dismissed keyboardDrummer’s stale review via 5975ea7 March 21, 2024 15:41

keyboardDrummer reviewed Mar 22, 2024

View reviewed changes

keyboardDrummer previously approved these changes Mar 22, 2024

View reviewed changes

keyboardDrummer reviewed Mar 22, 2024

View reviewed changes

Aleksandr Fedchin added 3 commits April 8, 2024 10:05

Merge remote-tracking branch 'origin/master' into CounterexamplesAsPr…

6f36f77

…edicates

Fix nitpick

203f725

Lift note on counterexample validity into its own section

48812d5

Dargones dismissed keyboardDrummer’s stale review via 48812d5 April 8, 2024 17:53

Fix tests

275db10

atomb approved these changes Apr 8, 2024

View reviewed changes

atomb added this pull request to the merge queue Apr 8, 2024

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Apr 8, 2024

atomb added this pull request to the merge queue Apr 8, 2024

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Apr 8, 2024

Dargones mentioned this pull request Apr 9, 2024

Deprecate unicode-char #5302

Merged

atomb enabled auto-merge (squash) April 9, 2024 15:28

atomb and others added 3 commits April 9, 2024 08:28

Merge branch 'master' into CounterexamplesAsPredicates

d964f3c

Merge branch 'master' into CounterexamplesAsPredicates

fac4c7c

Merge branch 'master' into CounterexamplesAsPredicates

a58755c

atomb merged commit 281ed82 into dafny-lang:master Apr 9, 2024
20 checks passed

keyboardDrummer mentioned this pull request Oct 21, 2024

[Regression] Show counterexamples failing with "Counterexample request failed: TypeError: Cannot convert undefined or null to object" dafny-lang/ide-vscode#492

Open

Dargones mentioned this pull request Oct 21, 2024

Fix: Dafny server API for counterexamples #5847

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Counterexamples As Assumptions #5013

Counterexamples As Assumptions #5013

Dargones commented Jan 25, 2024 •

edited by atomb

Loading

Dargones commented Mar 20, 2024 •

edited

Loading

keyboardDrummer commented Mar 21, 2024 •

edited

Loading

Dargones commented Mar 21, 2024

keyboardDrummer Mar 22, 2024

keyboardDrummer commented Mar 22, 2024 •

edited

Loading

keyboardDrummer Mar 22, 2024

Dargones Apr 8, 2024

keyboardDrummer commented Oct 21, 2024 •

edited

Loading

Dargones commented Oct 21, 2024

keyboardDrummer commented Oct 21, 2024

Counterexamples As Assumptions #5013

Counterexamples As Assumptions #5013

Conversation

Dargones commented Jan 25, 2024 • edited by atomb Loading

Problem

Solution

Dargones commented Mar 20, 2024 • edited Loading

keyboardDrummer commented Mar 21, 2024 • edited Loading

Dargones commented Mar 21, 2024

keyboardDrummer Mar 22, 2024

Choose a reason for hiding this comment

keyboardDrummer commented Mar 22, 2024 • edited Loading

keyboardDrummer Mar 22, 2024

Choose a reason for hiding this comment

Dargones Apr 8, 2024

Choose a reason for hiding this comment

keyboardDrummer commented Oct 21, 2024 • edited Loading

Dargones commented Oct 21, 2024

keyboardDrummer commented Oct 21, 2024

Dargones commented Jan 25, 2024 •

edited by atomb

Loading

Dargones commented Mar 20, 2024 •

edited

Loading

keyboardDrummer commented Mar 21, 2024 •

edited

Loading

keyboardDrummer commented Mar 22, 2024 •

edited

Loading

keyboardDrummer commented Oct 21, 2024 •

edited

Loading