Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support MIR string slices #1997

Closed
RyanGlScott opened this issue Dec 19, 2023 · 1 comment · Fixed by #2025
Closed

Support MIR string slices #1997

RyanGlScott opened this issue Dec 19, 2023 · 1 comment · Fixed by #2025
Labels
subsystem: crucible-mir Issues related to Rust verification with crucible-mir and/or mir-json type: enhancement Issues describing an improvement to an existing feature or capability

Comments

@RyanGlScott
Copy link
Contributor

Currently, the SAW MIR backend supports creating slices of type &[T], but it does not support creating a slice of type &str. In principle, however, SAW could support the latter without too much additional effort. Here is a sketch of how this would work in SAW:

  • We'd need mir_slice_str_value : MIRValue -> MIRValue and mir_slice_str_range_value : MIRValue -> Int -> Int -> MIRValue functions in SAWScript. Each of these functions would take a MIRValue argument representing a reference to an array of bytes (i.e., u8s) and return a MIRValue representing a &str slice.

  • Why an array of bytes? This is a consequence of how crucible-mir desugars &str slices. Specifically, crucible-mir represents a &str slice as a reference to a UTF-8–encoded sequence of bytes. Therefore, taking an array of bytes is the most natural way to interface with crucible-mir.

  • How are users expected to create these arrays of bytes? Most of the time, Cryptol's string literals will be a natural tool for the job. If you write something like "abc" in Cryptol, it will desugar to a sequence of bytes, so "abc" : [3][8]. As luck would have it, that is exactly what we need for crucible-mir.

  • This approach isn't perfect, since Cryptol isn't yet capable of handling Unicode string literals (see this issue). For instance, the expression "roșu" simply doesn't typecheck in Cryptol, since the character 'ș' would require 10 bits to represent instead of 8. Instead, you would have to manually encode the string into UTF-8 and write "ro\200\153u", where "\200\153" is the UTF-8 encoding of the character 'ș'. If we want to do better, we'd need to change Cryptol first.

@RyanGlScott RyanGlScott added type: enhancement Issues describing an improvement to an existing feature or capability subsystem: crucible-mir Issues related to Rust verification with crucible-mir and/or mir-json labels Dec 19, 2023
@RyanGlScott
Copy link
Contributor Author

Another question we will have to answer: should we emulate Rust's ability to check if a string slice over a subrange uses invalid indexing? Take this example from The Rust Programming Language:

let hello = "Здравствуйте";

let s = &hello[0..1];

This will panic at runtime because the Cyrillic letter З requires two bytes in UTF-8, but the range [0..1] only accesses the first of these two bytes, resulting in a slice that doesn't end at a char boundary.

The question: Should mir_str_slice_range_value str_ref 0 1 perform the same check on str_ref's underlying string? Doing so would require using the crucible-mir memory model to emulate the behavior of the is_char_boundary function. Not impossible to do, but likely a bit fiddly. If we opt not to do this, then we should advertise this shortcoming in the SAW manual.

RyanGlScott added a commit that referenced this issue Feb 7, 2024
This adds rudimentary support for constructing MIR `str` slice references via
the new `mir_str_slice_value` and `mir_str_slice_range_value` SAWScript
functions. These behave much like their cousins `mir_slice_value` and
`mir_slice_range_value` (which work for array slices), except that they return
something of type `&str` instead of, say, `&[u8]`. See the manual for the full
details and current limitations of the approach.

The `mir_str_slice_value`/`mir_str_slice_range_value` functions mostly share
the same implementation as `mir_slice_value`/`mir_slice_range_value`, except
for some minor differences regarding how the underlying types are handled.
There is now a new `MirSliceInfo` data type that signals whether we are dealing
with a `str` slice or an array slice.

Fixes #1997.
RyanGlScott added a commit that referenced this issue Mar 29, 2024
This adds rudimentary support for constructing MIR `str` slice references via
the new `mir_str_slice_value` and `mir_str_slice_range_value` SAWScript
functions. These behave much like their cousins `mir_slice_value` and
`mir_slice_range_value` (which work for array slices), except that they return
something of type `&str` instead of, say, `&[u8]`. See the manual for the full
details and current limitations of the approach.

The `mir_str_slice_value`/`mir_str_slice_range_value` functions mostly share
the same implementation as `mir_slice_value`/`mir_slice_range_value`, except
for some minor differences regarding how the underlying types are handled.
There is now a new `MirSliceInfo` data type that signals whether we are dealing
with a `str` slice or an array slice.

Fixes #1997.
RyanGlScott added a commit that referenced this issue Jul 2, 2024
This adds rudimentary support for constructing MIR `str` slice references via
the new `mir_str_slice_value` and `mir_str_slice_range_value` SAWScript
functions. These behave much like their cousins `mir_slice_value` and
`mir_slice_range_value` (which work for array slices), except that they return
something of type `&str` instead of, say, `&[u8]`. See the manual for the full
details and current limitations of the approach.

The `mir_str_slice_value`/`mir_str_slice_range_value` functions mostly share
the same implementation as `mir_slice_value`/`mir_slice_range_value`, except
for some minor differences regarding how the underlying types are handled.
There is now a new `MirSliceInfo` data type that signals whether we are dealing
with a `str` slice or an array slice.

Fixes #1997.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subsystem: crucible-mir Issues related to Rust verification with crucible-mir and/or mir-json type: enhancement Issues describing an improvement to an existing feature or capability
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant