Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unicode Strings #863

Open
jpziegler opened this issue Aug 11, 2020 · 4 comments
Open

Support Unicode Strings #863

jpziegler opened this issue Aug 11, 2020 · 4 comments
Labels
feature request Asking for new or improved functionality

Comments

@jpziegler
Copy link
Contributor

jpziegler commented Aug 11, 2020

Many crypto puzzles, such as substitution ciphers, use unusual characters in their alphabets. (For example, Edgar Allen Poe's Gold-Bug cipher.) It would be useful, for demos and recreational crypto at least, to allow such characters to be used as data in Cryptol.

Currently, Cryptol supports Unicode for symbol names, but not Strings.

It's possible that variable-length encodings would be challenging given Cryptol's type system, but UTF-32 might make sense.

@brianhuffman
Copy link
Contributor

I can think of a few possible designs for this.

  1. Unicode strings denote lists of UTF-8 bytes. This means that a string like "día" would denote the 4-byte string [0x64,0xC3,0xAD,0x61]. Unfortunately single-character literals wouldn't work beyond the 128 ascii codepoints, as those are the only 1-byte UTF-8 points. Also, it seems like a variable-length encoding would not be particularly useful for e.g. substitution ciphers. For these reasons I don't like this choice.

  2. Redefine Char from [8] to a larger fixed width. We could choose [16], which would cover most of unicode, or [32], which would yield UTF-32, as you suggested. One problem is that existing cryptol code written to use the 8-bit Char type will break. Also (and I don't know if this is actually something that people do) currently we can use :readByteArray and :writeByteArray to read/write strings to files, and we'd lose that ability if we got rid of 8-bit strings.

  3. Characters have polymorphic width, just like decimal literals. This would allow ascii strings to have type [n][8] (or even [n][7]), whereas strings containing codepoints beyond the first 256 could fit into wider types like [n][16] or [n][32]. So e.g. a character literal like 'Γ' would mean exactly the same thing as writing the decimal number 915, and we'd have 'Γ' : {a} (Literal 915 a) => a.

I think choice 3 would be really easy to implement, because we basically implement it that way already (except that we add an extra type constraint of [8] to every character) as we can see from this type error message:

Cryptol> 'Γ'

[error] at <interactive>:1:1--1:4:
  Unsolvable constraints:
    • 8 >= 10
        arising from
        use of literal or demoted expression
        at <interactive>:1:1--1:4
    • Reason: It is not the case that 8 >= 10

@weaversa
Copy link
Collaborator

To support variant 3: We have done specs that use both 6- and 7-bit ascii, and the conversions were a bit cumbersome.

brianhuffman pushed a commit that referenced this issue Aug 11, 2020
This makes it possible to use wider character types to represent
Unicode code points beyond the first 256.

See #863.
@brianhuffman
Copy link
Contributor

I made a draft PR (#864) so that people can try out polymorphic character literals, and let us know what you think. It was indeed very easy to implement.

The change breaks a few tests in our regression test suite: I had to add type constraints to string literals in six different tests in order to make them work. (The failures were things like :sat or :check complaining about a non-monomorphic type, or warnings about defaulting to Integer.)

@yav yav added the feature request Asking for new or improved functionality label Sep 17, 2020
@jpziegler
Copy link
Contributor Author

An alternative solution could be some kind of printing function or command, such as an enhanced trace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Asking for new or improved functionality
Projects
None yet
Development

No branches or pull requests

5 participants