-
Notifications
You must be signed in to change notification settings - Fork 25
Why Rust for cross language linking
We would like to develop a low-level library that can be used to construct multiple higher level libraries in several different languages.
To do this we need the library to be very robust, well defined, and able to interoperable well with other languages.
Cross language calls, add many additional requirements that aren't generally present when you are working within a single language.
Consider how two different languages might differ
- What character encoding is used for strings?
- Does conversion need to be done? Can the foreign string type be instantiated?
- How does error handling work in the calling language?
- Mapping one languages exceptions to another can be complicated
- Does data need to be aligned?
- Calling conventions and object alignment need to agree
- Are there OS/Target specific requirements?
- A Python library might expect its dependencies to be architecture agnostic which can be complicated for a native library.
- What allocator is used to allocate memory?
- Languages like Go and Java expect memory to allocated in a particular way.
- Are there restrictions on threading?
- For example Python's interpreter lock
- Is the memory garbage collected, in one or both languages?
- Is there an issue handing data from one heap to another?
- Can the implementation safely hold onto objects passed after the call has returned?
As a result libraries that are intended to be used by another language tend to benefit from the maximum level of control and precision in the specification of their interfaces.
Consider you want to write a function similar to the following Java:
interface Writer {
public void write(ByteBuffer src) throws IOException;
}
This signature convays a lot of information. It provides the parameters, the return type, and an exception for a way it can fail.
Because Java is memory managed, there is no need to worry about whose responsibility it is to free src
.
It also has some ambiguities such as will the method modify src
. You might guess it would updated the position but you'd have
to read the documentation to know for sure. It also could throw additional RuntimeExceptions.
Compare this to a similar C++ function:
size_t write(const void * ptr, size_t size, size_t count);
This function similarly has inputs and a return value. It however has several additional ambiguities:
- C++ is not memory managed so documentations is needed to determine who should free the memory pointed to by
ptr
. - While C++ has exceptions they are not used here (this is common), instead the return value must be checked which can be forgotten
-
size
andcount
describe how much data can be read fromptr
but nothing actually enforces this. If the passed sizes are too large it can be a security issue. - Like all methods in C++ it is not clear or enforced by its signature if it is Thread-safe, Signal-safe, or Exception-safe.
A signature from C
or Go
would look nearly identical and have these same ambiguities.
Rust has become an option for writing native libraries. because of its focus on control an safety.
Many people in the Python, Ruby, and JS community have adopted Rust as the preferred way to write "performance critical" functions in native code that can be invoked as part of high level libraries.
Compare the a above to a signature in Rust:
pub trait Writer: Sync {
fn write(&self, buf: &[u8]) -> Result<(), WriteErr>;
}
pub enum WriteErr {
NoSuchStream(),
CouldNotContactServer(),
//...
}
This signature encodes a lot more information. Here the caller knows:
- It is safe to call
Writer
from multiple threads concurrently. - The
buf
argument will not be modified bywrite
. - The
write
method won't retain any reference or pointer to any data inbuf
after the call returns. - It is the caller's responsibility to free
buf
, the implementation will not do it. - The
Result
provides an exhaustive list of the ways the method can fail, which can all be handled explicitly.
Similarly it allows for control of how memory is allocated and who is responsible for deallocating it.
Safety combined with Rust's great packaging tools has made it the language of choice for writing small reusable libraries. As a point of comparison: Go has more total developers who know the language, but Rust has more open source libraries available for download and is adding new ones more quickly than Go. This may be important depending on how frequently you need to look for a library to depend on. For example the process for linking Python to Go is has significantly less automation and tooling than doing the equivalent in Rust
Explicitness is a double edged sword. It makes writing code a lot more explicit which may slow down writing code when there
only ever one implementation and one call site and one author. To make writing code fast, languages like
Python
and Go
have chosen to omit unnecessary specification detail.
When there are many callers and a function that gets reused and maintained for a long time, the maintenance burden makes it worthwhile to spend the extra time up front, and use the compiler detect problems. In short: any savings in programming time, must be weighed against the bugs that could have been prevented.
For the case where we intend to implement a single high performance library and call into it from many languages, specifying additional information is well worth our time.
A longer description of Rusts values is here.
Rust uses the additional information to provide additional guarantees. Specifically, if code compiles it is guaranteed to:
- Comply with all aspects of the interface (or it will be a compiler error)
- Be free of data races
- Have no null pointer exceptions
- Have no class cast exceptions
- Have no iterator invalidation bugs
- Have no use after free bugs
- Have no double free bugs
- Have no dangling pointer bugs
- Not have any memory leaks
- Is not susceptible to buffer overflow or similar attacks
This is a direct result of Rust's ownership model which is its most unique feature. And is leveraged in many ways to provide strong guarantees
In reality we probably won't have to deal with all that, because Rust has high level frameworks like: PyO3 for python, Neon for NodeJs, Helix for Ruby and cbindgen for C and C++ which make it very easy to make cross language calls, because they've automated all of this work.