-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support MIR string slices #1997
Comments
Another question we will have to answer: should we emulate Rust's ability to check if a string slice over a subrange uses invalid indexing? Take this example from The Rust Programming Language: let hello = "Здравствуйте";
let s = &hello[0..1]; This will panic at runtime because the Cyrillic letter The question: Should |
This adds rudimentary support for constructing MIR `str` slice references via the new `mir_str_slice_value` and `mir_str_slice_range_value` SAWScript functions. These behave much like their cousins `mir_slice_value` and `mir_slice_range_value` (which work for array slices), except that they return something of type `&str` instead of, say, `&[u8]`. See the manual for the full details and current limitations of the approach. The `mir_str_slice_value`/`mir_str_slice_range_value` functions mostly share the same implementation as `mir_slice_value`/`mir_slice_range_value`, except for some minor differences regarding how the underlying types are handled. There is now a new `MirSliceInfo` data type that signals whether we are dealing with a `str` slice or an array slice. Fixes #1997.
This adds rudimentary support for constructing MIR `str` slice references via the new `mir_str_slice_value` and `mir_str_slice_range_value` SAWScript functions. These behave much like their cousins `mir_slice_value` and `mir_slice_range_value` (which work for array slices), except that they return something of type `&str` instead of, say, `&[u8]`. See the manual for the full details and current limitations of the approach. The `mir_str_slice_value`/`mir_str_slice_range_value` functions mostly share the same implementation as `mir_slice_value`/`mir_slice_range_value`, except for some minor differences regarding how the underlying types are handled. There is now a new `MirSliceInfo` data type that signals whether we are dealing with a `str` slice or an array slice. Fixes #1997.
This adds rudimentary support for constructing MIR `str` slice references via the new `mir_str_slice_value` and `mir_str_slice_range_value` SAWScript functions. These behave much like their cousins `mir_slice_value` and `mir_slice_range_value` (which work for array slices), except that they return something of type `&str` instead of, say, `&[u8]`. See the manual for the full details and current limitations of the approach. The `mir_str_slice_value`/`mir_str_slice_range_value` functions mostly share the same implementation as `mir_slice_value`/`mir_slice_range_value`, except for some minor differences regarding how the underlying types are handled. There is now a new `MirSliceInfo` data type that signals whether we are dealing with a `str` slice or an array slice. Fixes #1997.
Currently, the SAW MIR backend supports creating slices of type
&[T]
, but it does not support creating a slice of type&str
. In principle, however, SAW could support the latter without too much additional effort. Here is a sketch of how this would work in SAW:We'd need
mir_slice_str_value : MIRValue -> MIRValue
andmir_slice_str_range_value : MIRValue -> Int -> Int -> MIRValue
functions in SAWScript. Each of these functions would take aMIRValue
argument representing a reference to an array of bytes (i.e.,u8
s) and return aMIRValue
representing a&str
slice.Why an array of bytes? This is a consequence of how
crucible-mir
desugars&str
slices. Specifically,crucible-mir
represents a&str
slice as a reference to a UTF-8–encoded sequence of bytes. Therefore, taking an array of bytes is the most natural way to interface withcrucible-mir
.How are users expected to create these arrays of bytes? Most of the time, Cryptol's string literals will be a natural tool for the job. If you write something like
"abc"
in Cryptol, it will desugar to a sequence of bytes, so"abc" : [3][8]
. As luck would have it, that is exactly what we need forcrucible-mir
.This approach isn't perfect, since Cryptol isn't yet capable of handling Unicode string literals (see this issue). For instance, the expression
"roșu"
simply doesn't typecheck in Cryptol, since the character'ș'
would require 10 bits to represent instead of 8. Instead, you would have to manually encode the string into UTF-8 and write"ro\200\153u"
, where"\200\153"
is the UTF-8 encoding of the character'ș'
. If we want to do better, we'd need to change Cryptol first.The text was updated successfully, but these errors were encountered: