Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate unicode-char #5302

Merged

Conversation

keyboardDrummer
Copy link
Member

@keyboardDrummer keyboardDrummer commented Apr 5, 2024

Description

  • Deprecate unicode-char
  • Swap all the tests in /unicodechar with their equivalents outside of /unicodechar, so the tests in the folder are the ones using the non-default --unicode-char false

How has this been tested?

  • /unicode-char/Arrays.dfy does not have --allow-deprecation and outputs the deprecation warning that this PR introduces
  • Updated expect files of CLI tests

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

@keyboardDrummer keyboardDrummer marked this pull request as draft April 5, 2024 15:24
@@ -1,11 +1,11 @@

// Generating and Running Block-Based Tests:
// RUN: %baredafny generate-tests %args Block %S/TestGenerationNoInliningEnumerativeDefinitions.dfy > %t-tests.dfy
// RUN: %baredafny test %args --unicode-char:false --target:cs "%t-tests.dfy" >> "%t"
// RUN: %baredafny test %args --allow-deprecation --unicode-char false --target:cs "%t-tests.dfy" >> "%t"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dargones how come the test generation blogpost is using --unicode-char false?

Copy link
Collaborator

@Dargones Dargones Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5013 should fix this, among other things, since I have updated the format in which counterexamples report unicode characters. Once #5013 is merged, it should be possible to remove --allow-deprecation --unicode-char false from these tests as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll remove unicode-char=false here then.

@keyboardDrummer keyboardDrummer marked this pull request as ready for review April 10, 2024 10:41
@keyboardDrummer
Copy link
Member Author

@robin-aws There are quite a few tests using --unicode-char false that don't seem to be meant to test this option. I'm guessing that option was slapped on so the test wouldn't need any migration. Could you attempt to remove that from a few of them?

@keyboardDrummer keyboardDrummer enabled auto-merge (squash) April 10, 2024 12:29
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change! And this makes Arrays.dfy.go.expect unnecessary now, so we should delete it.

That also says to me that %testDafnyForEachCompiler should probably fail if a test passes for a backend but one of those expect files exists! That might be worth adding so that we can delete all the unused files, otherwise folks will get confused in the future.

Copy link
Member Author

@keyboardDrummer keyboardDrummer Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change anything, besides swapping comp/Arrays and unicodechars/comp/Arrays and their related files, and adding --allow-warnings to the version that has --unicode-char false, and updating its expect file to include the new warning

@@ -1,7 +1,6 @@
// NONUNIFORM: Multiple testing scenarios, highly backend sensitive, testing CLI
// RUN: %verify "%s" > "%t"
// RUN: %run --no-verify --target:cs "%s" Csharp 1 >> "%t"
// RUN: %run --no-verify --target:cpp --unicode-char:false "%s" Cpp Yipee >> "%t"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep these, otherwise we're reducing testing coverage for the cpp backend.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are kept. I've swapped comp/CompileWithArguments.dfy and unicode/comp/CompileWithArguments.dfy. I realize this is a confusing change in this PR, and it would have been better to do it before or after, but I think it might be bit of a hassle to extract it as a separate PR now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, that's just fine.

Copy link
Member

@robin-aws robin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got halfway through reviewing and spotted a problem: the C++ backend ONLY supports --unicode-char false. So if we unconditionally mark it deprecated we're essentially marking that backend deprecated, which I don't think we should do.

I think we have two options:

  1. Mark the mode deprecated only for all other backends. I'm not sure how to do that cleanly, since it implies we can't deprecate it when just verifying.
  2. Implement the mode for C++. That's non-trivial work and likely involves pulling in another runtime dependency, which is why I left it out of scope initially.

@robin-aws
Copy link
Member

bin-aws There are quite a few tests using --unicode-char false that don't seem to be meant to test this option. I'm guessing that option was slapped on so the test wouldn't need any migration. Could you attempt to remove that from a few of them?

Definitely willing to, but reasonable to do in a follow-up PR, since leaving them as is doesn't reduce testing coverage?

@keyboardDrummer
Copy link
Member Author

keyboardDrummer commented Apr 11, 2024

Got halfway through reviewing and spotted a problem: the C++ backend ONLY supports --unicode-char false.

Yes I saw that :-D

So if we unconditionally mark it deprecated we're essentially marking that backend deprecated, which I don't think we should do.

I think we have two options:

  • Mark the mode deprecated only for all other backends. I'm not sure how to do that cleanly, since it implies we can't deprecate it when just verifying.
  • Implement the mode for C++. That's non-trivial work and likely involves pulling in another runtime dependency, which is why I left it out of scope initially.

I think you're saying that the C++ backend has the same level of support as the other ones, albeit with possibly fewer supported features. If so, then --unicode-char false should be the default, so users of the C++ back-end don't run into the nasty surprise of verifying first with the default value, and then realizing they need a non-default to translate. I would then also mark --unicode-char as experimental, since the feature is not supported by all supported back-ends.

However, I would suggest doing the opposite, and I think you implicitly made the same decision when you made --unicode-char true the default without enabling support for this in the C++ backend: regard the C++ back-end as experimental, and accept that the default --unicode-char value for dafny verify does not work with the C++ back-end. We can turn off the deprecation warning when the C++ back-end is used. This is the first of two options you mentioned, with the addition of adding an experimental tag to the C++ back-end.

@robin-aws
Copy link
Member

However, I would suggest doing the opposite, and I think you implicitly made the same decision when you made --unicode-char true the default without enabling support for this in the C++ backend: regard the C++ back-end as experimental, and accept that the default --unicode-char value for dafny verify does not work with the C++ back-end. We can turn off the deprecation warning when the C++ back-end is used. This is the first of two options you mentioned, with the addition of adding an experimental tag to the C++ back-end.

Yup I agree this is the best option - we've already been documenting the C++ backend as incomplete/experimental in more ad-hoc ways. If you're saying you want to emit a regular warning when it's used as well, go right ahead!

Comment on lines 357 to 358
if (error == "" && OnlyAllowedOutputLines(backend, outputString)) {
if (error == "" && HasUnsupported(backend, outputString)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this do the wrong thing if there are unsupported feature lines but also some unrelated crash?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can change it back. Might have been for something that changed already.

@robin-aws
Copy link
Member

If it doesn't break things badly somehow, can we rename the unicodechars test directory to unicodecharfalse or something similar, so it's more obvious we've flipped the focus of those tests?

(aside from those in unicodechars and c++)

All remaining tests are using —unicode-char as an example for general compatibility testing, or using the C++ backend. The only exception is git-issue-356[-errors].dfy, which could be duplicated for `—unicode-char true` mode, but wouldn’t really add any additional coverage.
@robin-aws
Copy link
Member

There are quite a few tests using --unicode-char false that don't seem to be meant to test this option. I'm guessing that option was slapped on so the test wouldn't need any migration. Could you attempt to remove that from a few of them?

I pushed a commit that removes it from just about all of them (see commit message for the caveats)

@robin-aws
Copy link
Member

robin-aws commented Apr 16, 2024

Ah switching ErasableTypeWrappers.dfy revealed a bug: Java prints <gen> when all other backends print ['<', 'g', 'e', 'n', '>']. I think either could be correct, since it's an edge case https://github.com/dafny-lang/dafny/blob/master/docs/Compilation/StringsAndChars.md#printing-strings-and-characters doesn't completely address, since whether the type is "statically known" is different before vs. after the optimization. But it should be consistent, and the latter is easier, so there's probably a special case for Java that we should just remove.

@keyboardDrummer if you can see how to fix this easily be my guess, otherwise I'll figure it out tomorrow.

@keyboardDrummer
Copy link
Member Author

Ah switching ErasableTypeWrappers.dfy revealed a bug: Java prints <gen> when all other backends print ['<', 'g', 'e', 'n', '>']. I think either could be correct, since it's an edge case https://github.com/dafny-lang/dafny/blob/master/docs/Compilation/StringsAndChars.md#printing-strings-and-characters doesn't completely address, since whether the type is "statically known" is different before vs. after the optimization. But it should be consistent, and the latter is easier, so there's probably a special case for Java that we should just remove.

@keyboardDrummer if you can see how to fix this easily be my guess, otherwise I'll figure it out tomorrow.

The print statement for Java seems to be one big heuristic:

bool isGeneric = type.NormalizeToAncestorType().AsSeqType is { Arg: { IsTypeParameter: true } };
if (type.NormalizeToAncestorType().IsStringType) {
  argumentWriter = wr.ForkInParens();
  wr.Write(".verbatimString()");
} else if (type.NormalizeToAncestorType().IsCharType && UnicodeCharEnabled) {
  wr.Write($"{DafnyHelpersClass}.ToCharLiteral(");
  argumentWriter = wr.Fork();
  wr.Write(")");
} else if (isGeneric && !UnicodeCharEnabled) {
  wr.Write($"((java.util.function.Function<{DafnySeqClass}<?>,String>)(_s -> (_s.elementType().defaultValue().getClass() == java.lang.Character.class ? _s.verbatimString() : String.valueOf(_s)))).apply(");
  argumentWriter = wr.Fork();
  wr.Write(")");
} else {
  wr.Write("java.lang.String.valueOf(");
  argumentWriter = wr.Fork();
  wr.Write(")");
}

I think using reflection as this does is not going to work reliably. Instead you need a virtual method, either the standard string toString or a interface ICanPrint { string toPrintString() }

I think resolving that will be some work. How about we do not migrate this test yet, and instead file a GH issue to resolve this?

@robin-aws
Copy link
Member

Actually I think it's likely an easy fix: all those heuristics are exactly what --unicode-char true mode is supposed to turn off, so I think it's just a matter of tweaking the checking for UnicodeCharEnabled to account for the optimization as well. I'll give it a quick try before giving up and cutting an issue. :)

Comment on lines 378 to 382
// If we hit errors, check for known unsupported features or bugs for this compilation target
if (error == "" && OnlyAllowedOutputLines(backend, outputString)) {
return 0;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reverting the other change. Now AFAICT this is identical to lines 357-360, and the if block can't possibly change the result since it returns, so I think you can nuke this too.

@robin-aws
Copy link
Member

Yup turns out there was a helper method for resolving the type of an expression accounting for datatype wrapper erasure, which only Java was calling (probably because it was necessary to handle primitive types correctly). Adding it to the rest of the backends made the print behavior consistent and better! See 18d7059

Copy link
Member

@robin-aws robin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\U1F680

@keyboardDrummer keyboardDrummer merged commit ac24aef into dafny-lang:master Apr 19, 2024
20 checks passed
@keyboardDrummer keyboardDrummer deleted the deprecateUnicodeChar branch April 19, 2024 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants