-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent handling of Unicode characters in String theory #412
Comments
Thanks for opening the issue! I've now added some tests:
It's not too difficult to convert between UTF-16 and the SMTLIB escape format, but we'll have to decide which encoding we want to use internally. Specifically, should the |
I've added some more conversions and all solvers should now behave the same. The current format got On the other hand JavaSMT is written in Java, and the type Either choice will break the API, although I'd argue that keeping SMTLIB as format is more in line with how the functions used to work so far (we just didn't document it). In either case I would also suggest we also make @kfriedberger, @baierd: |
The current implementation (and the implementation from #422) still avoid special cases like invalid Unicode escaping like "\u" (without a digit) or "\u{123456789}" (too long escape sequence). Additionally, SMTLIB has multiple ways to represent a single string constant and JavaSMT copies some behaviour, e.g., input as plain Java string using UTF16, or also escaped versions. It is unclear, how to double-escape an escape sequence in SMTLIB. With #422, JavaSMT should allow all Java-letters from UTF16 for all solvers and escape them if needed. We also provide Java-letters in UTF16 in the model. |
Thanks for the explanation! I've moved the remaining Z3/CVC4 issue from #420 (link) to this branch and solved it, along with some other minor points. Double escaping is possible by substituting one (or all) of the letters from the escape sequence, for instance:
Escaping the
Here we need to substitute Even with this fix there still is a bit of an issue if we continue to use the concatenated Strings:
It might be a bit unexpected that Maybe one solution would be to simply ignore escape sequences in
The same could also be used when getting values from the model:
This could for instance be useful when trying to print the result to a SMTLIB file where |
Different solvers return different results when using Unicode characters in String theory.
This should be analyzed. Maybe we need to fix JavaSMT or report to the solvers' developers.
Details: #391 (comment)
The text was updated successfully, but these errors were encountered: