Sonder is a static analyser and transpiler, for converting well-written C code to Rust.
This project is incredible WIP and not at all intended for production-grade work. It exists purely as a proof of concept.
For the purposes of sonder, well-written means that for any given pointer, the C code in question either
- Treats it like a Rust reference in accordance with borrow-checking rules or
- Treats it a cloned
Rc<RefCell<T>>
If the code isn't "well-written", sonder will fall back on unsafe raw pointers in the generated code, although at the moment, it instead panics.
- The adding addresses
&foo + &bar // illegal
- Dereferencing pointer arithmatic
*(foo + bar) // illegal
I'm aware that this includes legal array indexing if foo or bar aren't pointers, however, the sonder ast treats that as a totally seperate thing. So arr[foo]
is legal while *(arr + foo)
or foo[arr]
aren't, even though they're equivalent.
- Dereferencing non-ptrs
*not_ptr // illegal
- TEMPORARY ASSUMPTION: Mutable references can't be made unless they're tied to a ptr declaration. Adresses are always immutable unless explicitely annotated otherwise by the ptr declaration.
list.append(&mut other_list) // not something we're going to worry about for now
In the future, we will treat arguments as being bound to parameters as variables
So,&mut other_list
will be treated as if it's bound to the variable value inside the function (see below)
Any of these will immediantly result in raw pointers being used, although at the moment, they panic
The issue is, while every ref should be attatched to a symbol, not all c pointers will be. This means that we need a way to safely handle them without just panicking because we have a vector with more than one element.
Currently every instance of an address being taken it is assumed to be assigned to a pointer variable.
Specifically, every pointer variable was assumed to be directly tied to that reference and only that reference.
This is really convinient for handling all analysis in one big vars
map data-structure, but causes problems if we want to do things like this:
some_function(&p); // &p is not tied to any variable at all
// or
int n = 0;
int g = &n;
int m = 1;
g = &m; // g points to different things at different times AHHHHHH
In both cases, a PtrDeclaration isn't handling the reference (a FunctionCall and Assignment) are respectively.
In the first case, the plan was to have the address argument be assigned to the corredsponding parameter variable and let however we handle scope figure this out.
However, the second case presents a challenge, g
points to different values at different times, which makes the static analysis we've been doing really tricky.
Instead of pushing the first issue back to be dealt with by whatever code handles scopes and functions, it seems that we should handle them with a single refactor of the way references are tracked in analysis.
This will mean having a seperate data-structure for tracking references and having variables optionally link to them or vise versa, instead of ptr_data being optionally contained within var_data. This linking could be done via Bimap or mutual references. The data structure for keeping track of references would need to preserve the order that the references are taken in, so a vector seems fine
The question then becomes, how do we link address and variable? An optional field in VarData seems like the best choice.
int main() {
int t = 0;
int* k = &t;
*k = t + 1;
}
In this case, Rc won't work, since the borrow and mutable borrow are on the same line in the invalid translated code. Meaning we would get something like this:
fn main() -> () {
let mut t = Rc::new(RefCell::new(0));
let k = t.clone();
*k.borrow_mut() = t.borrow() + 1; // On the same line, panics
}
If the &mut and (sub_value | &mut) are used on the same line (that isn't the borrowing line) Cloning or raw ptrs are required.
We then need to decide if changing the semantics of the program any amount is valid, such as placing a new item on the stack for the sake of memory safety
This sounds really difficult, so
eg.
fn main() -> () {
let mut t = 0;
let t_tmp = t.clone();
let k = &mut t;
*k = t_tmp + 1;
}
or
fn main() -> () {
let mut t = 0;
let k = &mut t as *mut i32;
unsafe {
*k = t + 1;
}
}
- Rethink reference tracking
- Write more test cases for the current prototype
- Figure out how to represent scope
- Create a system for managing scope
- Write scope-based borrowing checking
Sonder is broken up into a few differenc components for performing different tasks; they are run sequentially. Note that all pointers are assumed to be either immutable or mutable references unless the checker deems otherwise
The Analyser determines the necessary variable semantics for performing borrow-checking on C code. The primary data-structure involved is a map related variable names to information that would be aparent in Rust code, but must be inferred in C code. This includes:
- Is the variable mutated by via pointer, directly, or not at all?
- Is the variable a pointer? If so, where does it point to, and does it mutate that variable.
- For what line-ranges does the variable appear to be in scope and not "behind a reference" (note that whether or not this actually follows borrow-checking rules is irrelevent to the analyser, it simply collects information)
The Checker performs a rudimentary, lexical form of borrow-checking, by validating the "lexical-lifetimes" of each mutable reference.
If any mutable reference to a piece of data overlaps with an immutable reference to that data or with the usage of the underlying value, the underlying variable, reference, and all other references to that variable are assumed to not follow borrow-checking rules, but still be "well-written," and are marked as Rc<RefCell>>
s.
This isn't comprehensive borrow-checking and must be extended in numerous ways, most importantly to include function-based move semantics.
The Annotater takes the information about variables produced by the Analyser and Checker and creates a new AST that includes this information in necessary places, for example:
- All declarations are annotated with whether the variable is mutable
- PtrDeclarations are annotated with the pointer type (
Rc<RefCell<T>>
,&mut
,&
,*mut
, or*const
) The generated AST is essentially a rudimentary Rust AST.
The Converter takes the annotated AST and uses it to generate a corresponding Rust program.