rustgc_paper.ltx

%&rustgc_paper_preamble
\endofdump

\begin{document}

\begin{abstract}
\noindent Rust is a non-Garbage Collected (GCed) language, but the lack of GC
makes expressing data-structures whose values have multiple owners awkward, inefficient, or both.
In this paper we explore a new design for, and implementation of GC in Rust, called
\ourgc, identifying the key challenge as finalisation. Unlike previous
approaches to GC in Rust, \ourgc maps existing Rust destructors to
finalisers: this makes GC in Rust natural to use but introduces surprising
soundness, performance, and ergonomic problems. \ourgc provides solutions for
each of these problems.
\end{abstract}

\maketitle


\section{Introduction}

\begin{figure}[t]
\lstinputlisting[language=Rust, firstline=6]{listings/first_example.rs}
\captionof{lstlisting}{An \ourgc example, showing use of the \lstinline{Gc<T>}
  type and destructors as finalisers. We create a type \lstinline{GcNode} which
  models a graph: it stores an 8 bit integer value and a reference (possibly
  null, via Rust's standard \lstinline{Option} type) to a neighbouring node
  (line 1). We add a normal Rust destructor which \ourgc is able to use as a
  finaliser when \lstinline{GcNode} is used inside \lstinline{Gc<T>} (line 2).
  Inside \lstinline{main} we create the first GCed node in the graph (line 5).
  We use Rust's normal \lstinline{RefCell} type to allow the node to be mutated
  (using the \lstinline{RefCell::borrow\_mut} method to dynamically detect mutation that
  would undermine Rust's static borrow checker rules)
  and pointed to itself (line 6): i.e.~we create a cycle using
  \lstinline{Gc<T>}. Cycles such as this cannot be created in the base Rust
  language without \lstinline{unsafe} code. One can create cycles with standard
  library types such as the reference counting \lstinline{Rc<T>}, though these
  require careful, laborious programming to avoid memory leaks
  (see~\cref{fig:rc_example}). We then create a second cyclic graph (lines 7
  and 8), immediately assigning it to the \lstinline{gc1} variable (line 9).
  Doing so means that the first cyclic graph \lstinline{GcNode\{value: 1, ..\}}
  is no longer reachable, so when we force a collection (line 10) that node
  will be recognised as collectable; its finaliser is scheduled to be run (causing
  \lstinline{drop 1} to be printed out at a later point), and when done so the backing
  memory can be reclaimed. The print statement outputs \lstinline{2 2} (line 11).}
\label{fig:first_example}
\end{figure}

Amongst the ways one can classify programming languages are whether they
are Garbage Collected (GCed) or not: GCed languages enable implicit memory management;
non-GCed languages require explicit memory management (e.g~\lstinline{C}'s \lstinline{malloc} /
\lstinline{free} functions). Rust's use of affine types and ownership does not
fit within this classification: it is not GCed but it has implicit memory management.
Most portions of most Rust programs are as
succinct as a GCed equivalent, but ownership is too inflexible to express
data-structures that require multiple owners (e.g.~doubly linked lists).
Workarounds (e.g.~reference counting) impose an extra burden on the programmer,
make mistakes more likely, and often come with a performance penalty.

In an attempt to avoid such problems, there are now a number of GCs for Rust
(e.g.~\cite{manish15rustgc, coblenz21bronze, gcarena, boa, shifgrethor}). Most
introduce a user-visible type \lstinline{Gc<T>} which takes a value
of type \lstinline{T}, moves that value to the GC heap, and returns a wrapper around
a pointer to the moved value. \lstinline{Gc<T>} can be \emph{cloned} (i.e.~duplicated) and
\emph{dereferenced} to \lstinline{T} at will by the user. When no
\lstinline{Gc<T>} values can be found, indirectly or directly, from the
program's \emph{roots} (e.g.~variables on the stack, etc.),
then the underlying memory can be reclaimed.

It has proven hard to find a satisfying design and implementation for a GC for
Rust, as perhaps suggested by the number of different attempts to do so.
We identify two fundamental challenges
for GC for Rust: how to give \lstinline{Gc<T>} a familiar, idiomatic, complete
API; and how to make \emph{finalisation} (i.e.~the code that is run just before a
values collected by the GC) safe, performant, and ergonomic. We show that
using conservative GC is necessary and sufficient to solve the API challenge,
but the finalisation challenge is more difficult.

In existing GCs for Rust, if a user needs values of type \lstinline{T} to cause a
finaliser to run, then the user must manually implement a finaliser.
Not only do most finalisers end up duplicating
existing \emph{destructors} (i.e.~code which is run just before a value is
reclaimed by Rust's implicit memory management) but this makes it impossible to
provide finalisers for types in external libraries.

The `obvious' solution to this problem is to allow existing Rust destructors to
be automatically used as finalisers. However, this introduces a number of
problems. In the nearest context to our work, GC for C++, this solution was
explicitly ruled out as the problems were thought
insoluble~\cite[p.~32]{boehm09garbage}. We break these problems down into three
categories:
(1) some safe destructors are not safe finalisers;
(2) finalisers can be run too early;
and (3) finalisers are prohibitively slower than destructors. All are,
at least to some degree, classical GC problems; all are exacerbated
in some way by Rust; and none, with the partial exception of (2), has
existing solutions.

We introduce novel solutions, relying at least in part on Rust's unusual static
guarantees, to each of these problems. We thus gain not just a better GC for
Rust, but also solutions to open GC problems. Our solutions to the problems,
in order, are:
(1) \emph{finaliser safety analysis}
extends Rust's type system to reject programs whose destructors are not
provably safe to be used as finalisers;
(2) \emph{early finaliser prevention} automatically inserts barriers to prevent
optimisations or register allocation from `tricking'
the GC into collecting values before they are dead;
and (3) \emph{finaliser elision} statically optimises away finalisers if the
underlying destructor duplicates work the GC does anyway.

These solutions are implemented as part of \ourgc, a new GC for Rust.
Although \ourgc is not production ready, it has good enough performance
(across a number of benchmarks, its performance is \laurie{XYZ}) and
other polish (e.g.~good quality error messages) that we believe it shows
a plausible path forwards for those who may wish to follow it. Furthermore,
although \ourgc is necessarily tied to Rust, we believe that most of the
techniques in this paper, particularly those related to finalisation, are
likely to generalise to other ownership-based languages.

This paper's high-level structure is: background (\cref{sec:background});
\ourgc's design (\cref{sec:alloy_design}); destructor and finaliser challenges
and solutions (\cref{sec:destructor challenges}); and evaluation
(\laurie{XYZ}). The first three of these parts have the challenge that our work
straddles two areas that seem almost mutually exclusive: GC and Rust. We have
tried to provide sufficient material for readers expert in one of these
areas to gain adequate familiarity with the other, without undue prolixity, but perfection is beyond our
meagre grasp: we encourage readers to skip material they are already
comfortable with.


\section{Background}
\label{sec:background}

\subsection{Does Rust need a GC?}

\begin{figure}[t]
\lstinputlisting[language=Rust, firstline=5]{listings/rc_example.rs}
\captionof{lstlisting}{
  A version of~\cref{fig:first_example} using Rust's standard reference
  counting type \lstinline{Rc<T>}. To avoid memory leaks we use \emph{weak}
  references between nodes (line 1). We again create two cyclic graphs (lines
  6--9) using \lstinline{Rc::downgrade} to create weak references (lines 7 and
  0). Since \lstinline{Rc<T>} is not copyable, we must use a manual
  \lstinline{clone} call to have both the \lstinline{rc1} and \lstinline{rc2}
  variables point to the same cyclic graph (line 10). Accessing a neighbour
  node becomes a delicate dance requiring upgrading the weak reference (line 11).
  The need to downgrade \lstinline{Rc<T>} to \lstinline{Weak<T>} and upgrade
  (which may fail, hence the \lstinline{unwrap}) back to \lstinline{Rc<T>}
  creates significant extra complexity relative to~\cref{fig:first_example}: compare
  line 11 in \cref{fig:first_example} to (the much more complex) lines 10-12
  in this \lstinline{Rc} example.
}
\label{fig:rc_example}
\end{figure}

Rust uses affine types~\citep{pierce04advanced} and \emph{ownership}
to statically guarantee that: a \emph{value} (i.e.~an instance of a type) has a
single owner (e.g.~a variable); an owner can \emph{move} (i.e.~permanently
transfer the ownership of) a value to another owner; and
when a value's owner goes out of scope, the value's destructor
is run and its backing memory reclaimed. An owner can pass \emph{references} to a value
to other code, subject to these static restrictions: there can be
multiple read-only references (`\lstinline{&}') to a value or a single
read-write reference (`\lstinline{&mut}'); and references cannot outlast the owner.

These basic rules mean that many Rust programs are as succinct as their equivalents
in GCed languages. This suggests that the search for a good GC for Rust may be Quixotic:
intellectually stimulating, but of no practical value.

However, there are many programs which need to express data structures which
are sound but which do not fit into the restrictions of affine types and
ownerships. These are often described as `cyclic data-structures', though we
prefer the more general wording of `values which may have more than one owner'
(e.g.~interpreters for dynamically typed languages are examples of programs
which are best thought of in the latter category). Rust cannot directly express
such programs, forcing programmers to use various workarounds.

Probably the most common -- or, at least, the most easily recognised in other's
code -- workaround is the reference counting type \lstinline{Rc<T>} in Rust's
standard library. For many data-structures, reference counting is a reasonable
solution, but using it for values which may have multiple owners requires
juggling strong and weak counts. This complicates the program (see~\cref{fig:rc_example}) and makes
it easy for values to live for shorter or longer than intended.

Another common workaround is to store values in a vector and use
integer indices into that vector. Such indices are then morally closer to
machine level pointers than normal Rust references: the indices can become
stale, dangle, or may never have been valid in the first place. The programmer
must also manually deal with issues such as detecting unused values,
compaction, and so on. In other words, such workarounds force the programmer
to write a partial GC themselves. A variant on this idea are
arenas, which gradually accumulate multiple values but free all of them in one go: values
can no longer be reclaimed too early, but, equally, individual values can not be
reclaimed until all values are determined to be unneeded.

A type-based approach is \lstinline{GhostCell}s~\cite{yanovski21ghostcell},
which allow data-structures that have multiple owners to be built
by statically guaranteeing that
there is a single owner for the entire data-structure at any point in the
program. However, this implicitly prevents multiple owners (e.g.~in different threads)
from reading or mutating different parts of the structure.

Although it is easily overlooked, some workarounds (e.g.~\lstinline{Rc<T>})
rely on using \emph{unsafe} Rust (i.e.~parts of the language, often involving
pointers, that are not fully statically checked by the compiler). It is reasonable
to assume that widely used code, even if technically unsafe, has been pored
over sufficiently that it is mostly, perhaps even wholly, reliable in practise. It
is less reasonable to assume that `new' solutions that a programmer implements using
unsafe Rust will immediately reach the same level of reliability. Our experience
is that most Rust programmers will avoid writing new unsafe code if they can, even
if doing so might in the long-term lead to a better system.

While we do not believe that every Rust program would be improved by GC, the
variety of workarounds already present in Rust code suggests that a substantial
subset might benefit from GC.


\subsection{GC terminology}

GC is a venerable field and has accumulated terminology that can seem
unfamiliar or unintuitive. We mostly use the same terminology
as~\cite{jones16garbage}, the major parts of which we define here.

A program which uses GC is split between the \emph{mutator} (the user's program) and
the \emph{collector} (the GC itself). At any given point in time, a particular thread is either
running as a mutator or a collector. In our context, all threads
run as a collector at least sometimes, with some threads always running as a collector.
Tracing and reclamation is performed during a \emph{collection} phase. In the context of
this paper, collections are \emph{stop-the-world}, where all mutator threads
are paused while collection occurs.

A \emph{tracing} GC is one that scans memory looking
for reachable objects from a program's roots: objects that are not reachable
from the roots can then be \emph{reclaimed}. In contrast, a reference counting GC does
not scan memory, and thus cannot free objects that form a cycle. As is common
in most GC literature, henceforth we use `GC' to mean `tracing GC' unless
explicitly stated otherwise.

We refer to memory which is solely under
the control of the GC as the \emph{GC heap}, though this may in practise be
intermingled with non-GCed memory. We use the term `GC value' to refer both to the pointer wrapped in a
\lstinline{Gc<T>} and the underlying value on the GC heap, even though multiple
pointers / wrappers can refer to a single value on the heap, unless doing so
would lead to ambiguity.

We use ``\ourgc'' to refer to the combination of: our extension to the Rust
language; our modifications to the \lstinline{rustc} compiler; and our
integration of the Boehm-Demers-Weiser (BDW) GC into the runtime of programs
compiled with our modified \lstinline{rustc}.


\section{\ourgc: Design and Implementation}
\label{sec:alloy_design}

In this section we outline \ourgc's basic design and implementation choices --
the rest of the paper then goes into detail on the more advanced aspects.


\subsection{Basic Design}

\ourgc provides a \lstinline{Gc<T>} type that exposes an API modelled on the
reference counting type \lstinline{Rc<T>} from Rust's standard library, because
\lstinline{Rc<T>}: is conceptually
similar to \lstinline{Gc<T>}; widely used in Rust code, and its API
familiar; and that API reflects long-term experience about what Rust programmers
need from such a type.

When a user calls \lstinline{Gc::new(v)}, the value \lstinline{v} is
moved to the GC heap: the \lstinline{Gc<T>} value returned to the user is a
simple wrapper around a pointer to \lstinline{v}'s new address. Since the
purpose of a \lstinline{Gc<T>} is to allow a value to have multiple owners,
it must not only be possible to dereference it arbitrarily many times, but
the lifetimes of those references must be allowed to overlap. To avoid undermining
Rust's ownership system, this means that dereferencing a \lstinline{Gc<T>} must
produce an immutable (i.e.~`\lstinline{&}') reference to the underlying value.
If the user wishes to mutate the underlying value, they must use other Rust
types that enable \emph{interior mutability} (e.g.~\lstinline{RefCell<T>} or
\lstinline{Mutex<T>}).

One feature that \ourgc explicitly adopts is \lstinline{Rc<T>}'s
ability to be transformed into a raw pointer (\lstinline{into_raw}) and
back (\lstinline{from_raw}). Though many programmers do not directly
encounter these functions, they are a crucial link between Rust's high and
low-level features (e.g.~being used for the C Foreign Function Interface
(FFI)). We believe that a viable GC for Rust must include this same
ability, but doing so has a profound impact: Rust allows raw pointers to be converted to
the integer type \lstinline{usize} and back\footnote{Although it is outside the
scope of this paper, it would arguably
be preferable for Rust to have different integer types for `data width' and
`address'. Modern C, for example, does this with the \lstinline{size_t} and
\lstinline{uintptr_t} types respectively. Rust now has a provenance lint to
nudge users in this general direction, but the \lstinline{as}
keyword still allows arbitrary conversions.}.

\label{conservative_gc}
Having acknowledged that pointers may end up disguised as integers, it is then
inevitable that \ourgc must be a \emph{conservative} GC, which treats each
reachable machine word as a possible pointer: if,
when considered as a pointer, a word's address falls within a GCed block of memory,
then that block itself is considered reachable (and thus transitively scanned).
Since a conservative GC cannot know if a word is really a pointer, or just happens to be a sequence of
bits that also happens to make it a valid pointer, this over-approximates the
\emph{live set} (i.e.~the blocks that the GC will not reclaim). However, the
most extensive study we know of suggests the false detection rate in Java
programs is under 0.01\% of live objects~\cite{shahriyar14fast}, so it is
rarely a problem in practise.

Conservative GC occupies a grey zone in programming language semantics. In most
languages -- including the semantics of most compiler Intermediate
Representations (IRs) -- conservative GC relies on undefined behaviour, and
some languages allow arbitrary `bit fiddling' on pointers that temporarily
obscures the address they are referring to. \ourgc's use of conservative GC means that it
is, formally speaking, unsound. However, conservative GC is widely used,
including in the two most widespread web browsers: Chrome uses it in its Blink rendering
engine~\citep{ager13oilpan} and Safari uses it in its JavaScript VM JavaScriptCore~\citep{pizlo17riptide}.
Even in 2024, we lack good alternatives to conservative GC: there is no cross-platform API for precise GC; and while some
compilers such as LLVM provide some support for GC features, we have found them
incomplete and buggy. Despite the potential soundness
worries, conservative GC thus remains a widely used technique.

Conservative GC enables \ourgc to make a useful ergonomic improvement over
most other GCs for Rust whose \lstinline{Gc<T>} is only \emph{cloneable}. Such types can be duplicated, but doing
so requires executing arbitrary user code. To make the possible run-time cost of this clear, Rust has
no direct syntax for cloning: users must explicitly call \lstinline{Rc::clone(&x)}
to duplicate a value \lstinline{x}. In contrast, since \ourgc's \lstinline{Gc<T>} is just a wrapper around a pointer it
is not just cloneable but also \emph{copyable}: duplication only requires copying
bytes (i.e.~no arbitrary user code need be executed). Copying is implied by assignment,
reducing the need for a function call entirely\footnote{The lengthier
syntax \lstinline{y = Gc::clone(&x)} is available, since every copyable type is
also cloneable.}. This is not just a syntactic convenience but also reflects an underlying
semantic difference: duplicating a \lstinline{Gc<T>} in \ourgc is is a cheaper and simpler operation
than in other GCs for Rust (or, indeed, an \lstinline{Rc<T>}).


\subsection{Basic Implementation}

The most visible aspect of \ourgc is its fork, and extension of, the standard
Rust compiler \rustc. We forked \rustc~\rustcversion and have
added or changed approximately 3,150 Lines Of Code (LOC).

\ourgc uses the conservative Boehm-Demers-Weiser GC (\boehm)~\citep{boehm88garbage} as its collector.
Although we use some uncommon, and extend other, parts of \boehm, there is
nothing inherently unique about \boehm from \ourgc's perspective.

We made the following changes to \boehm. First, we disabled \boehm's parallel collector threads
because, for reasons we don't fully understand, it worsens \ourgc's
performance. Second, \boehm cannot scan pointers stored in thread locals
because these are platform dependent. Fortunately, \rustc uses LLVM's
thread local storage implementation, which stores such pointers in the
\lstinline{PT_TLS} segment of the ELF binary: we modified \boehm to scan
this ELF segment during each collection. Third,
\boehm normally dynamically intercepts thread creation calls so that it can
then can scan their stacks, but (for bootstrapping
reasons) it is unable to do so in our context: we explicitly changed \ourgc
to register new threads with \boehm. Fourth, we modified \boehm to run finalisers
on a separate thread (see \laurie{Section XYZ}).


\section{Destructors and Finalisers}
\label{sec:destructor challenges}

When a value in Rust is \emph{dropped} (e.g.~the value's owner went out of lexical
scope) its destructor is automatically run. Rust destructors are formed of two
parts, run in the following order: a user-defined \emph{drop method}; and
automatically inserted \emph{drop glue}. Drop methods are optional; users
can provide one for a type by implementing the \lstinline{Drop} trait's \lstinline{drop}
method. Drop glue recursively calls destructors of contained types (e.g.~fields
in a \lstinline{struct}). Although it is common usage to conflate `destructor' in
Rust with drop methods, drop glue is an integral part of a Rust destructor:
we therefore use `destructor' as the umbrella term for both drop methods and drop glue.

Rust's destructors enable a style of programming that originated in C++ called RAII (Resource
Acquisition Is Initialization)~\cite[Section~14.4]{stroustrup97c++}: when a
value is dropped, the underlying resources (e.g.~file handles or mutexes) it
acquired when created are released. Whether one considers all such uses
to be RAII or not, a brief perusal of Rust code will quickly show both that
types that have drop methods are used frequently, and that users fairly often
implement their own drop methods.

Existing GCs for Rust have very separate notions of destructors and finalisers.
Where the former have the \lstinline{Drop} trait, the later typically have
a \lstinline{Finalise} trait. If a user type needs to be finalised, then
the user must provide an implementation of the \lstinline{Finalise} trait for that type.
However, doing so introduces a number of problems:
(1) external libraries are
unlikely to provide finalisers; (2) Rust's \emph{orphan rule} \laurie{jake: is
https://rust-lang.github.io/chalk/book/clauses/coherence.html the best
reference for this?} prevents one implementing traits for types defined in
external libraries (i.e.~unless a library's types were designed to support
\lstinline{Gc<T>}, those types cannot be directly GCed); (3) one cannot
automatically replicate drop glue for finalisers; and (4) one cannot replicate
\rustc's refusal to allow calls to the equivalent of \lstinline{Drop::drop}.

Programmers can workaround problems (1) and (2) in various ways. For example,
they can wrap external library types in \emph{newtypes} (zero-cost wrappers)
and implement finalisers on those instead. Doing so involves drudgery, but
little conceptually difficulty.

Problem (3) has partial solutions: for example, ~\cite{manish15rustgc} uses the
\lstinline{Trace} macro to generate `finaliser glue' (our term) for
\lstinline{struct} fields. This runs into an unsolvable variant of problem (2):
types in external libraries will not implement this trait and cannot be
recursively scanned for finaliser glue.

Problem (4) is impossible to solve in Rust as-is. One cannot define a function
that can never be called --- what use would such a function have? It might seem
tempting to have the \lstinline{finalize} method take ownership of the value,
but \lstinline{Drop::drop} does not do so because that would not allow drop
glue to be run afterwards.

In summary: destruction is a core part of Rust; a GC for Rust ideally needs to
maintain most if not all of those properties for finalisation; but some parts
cannot be replicated in normal user code.


\subsection{General Challenges When Using Destructors as Finalisers}

Even if there were no Rust-specific challenges to using destructors as
finalisers, we would still face the problem that finalisers and destructors
have different, and sometimes incompatible, properties. The best overall
guide to these differences, and the resulting problems, is~\cite{boehm03destructors},
supplemented by later work by some of the same authors on support
for GC in the C++ specification~\cite{boehm09garbage}\footnote{These features
were added to the C+11 specification, but do not seem to have been implemented by
compilers. C++23 removed these features.}.

An obvious difference between destructors and finalisers is when both
are run. Where C++ and Rust define
precisely when a destructor will be run\footnote{Mostly. Rust's `temporary
lifetime extension' delays destruction, but for how long is currently
unspecified.}, finalisers run at an unspecified point in time
after the last reference to a GCed value is removed. However, in some situations
finalisers can be run too early: valid optimisations (e.g~scalar replacement)
can cause a value to appear to be unused, and thus ready for finalisation,
even though some of its constituent parts are still being used.

A less obvious difference relates to where destructors and finalisers are run.
Destructors are run in the thread that the last owner of the value ran in.
However, running finalisers in the same thread as the last owner of the value
ran in can cause race conditions and deadlocks if one or more finalisers try to access a resource that the mutator
expects to have exclusive access too~\cite[section~3.3]{boehm03destructors}.
Although such problems can
affect destructors, it is a clear case of programmer error, since they should
have taken into account the predictable execution point of destructors. Since
finalisers have no such predictable execution point, there is no way for finalisers
to safely access shared resources if they are run on the same thread. Such
problems can only be avoided by running
finalisers on a non-mutator thread: however, not all Rust destructors
are safe to run on another thread.

Perhaps surprisingly, finalisation and cycles are problematic: finalisers
can reference other GCed values that are partly, or wholly, `finalised' and may
even have had their backing memory freed or reused. A related, but
distinct, problem is the ability of finalisers to `resurrect' values by
copying the reference passed to the finaliser and storing it somewhere.


\subsection{The Challenge of Finalisers for \ourgc}

At this point we hope to have convinced the reader of two general points: a
viable GC for Rust needs to be able to use existing destructors as finalisers
whenever possible; and that finalisers, even in existing GCs, cause
various problems.

Over time, finalisers have come to be viewed with increasing suspicion. Java,
for example, has deprecated, and intends eventually removing, per-type
finalisers: instead it has introduced deliberately less flexible per-object `cleaners', whose API
prevents problems such as object resurrection~\cite{goetz21deprecated}. It
is important to differentiate such mechanisms from the \lstinline{Finalize} trait that many existing GCs for
Rust force users to manually implement: cleaners impose restrictions
to make finalisers less problematic; existing \lstinline{Finalize} traits
do not impose any such restrictions.

Our desire that \ourgc should use existing Rust destructors as finalisers whenever
possible may thus seem out of reach. Indeed, in the nearest
context to our work GC for C++ this solution was explicitly ruled out for GC for C++ as the
problems were thought insoluble~\cite[p.~32]{boehm09garbage}. We break these
problems down into four:
(1) finalisers are prohibitively slower than destructors;
(2) finalisers can be run too early;
(3) running finalisers on the same thread as a paused collector can cause race conditions and deadlocks;
(4) some safe destructors are not safe finalisers;

Fortunately for us, Rust's unusual static guarantees, suitably expanded by
\ourgc, allow us to address each problem in a satisfying way. In the following
section, we tackle these problems in the order above, noting that we tackle problems
(1) and (2) separately, and (3) and (4) together.


\section{Finaliser Elision}

Finalisers are slower to run than destructors --- running every Rust destructor
as a GC finaliser in \ourgc is prohibitively slow (\laurie{give rough figures
e.g.~a range}). In this section we show how to sidestep this problem in many
practical situations.

A variety of factors contribute to the performance overhead of finalisers, such as:
a queue of finalisers
must be maintained, whereas destructors can be run immediately; since finalisers run
some time after the last access of a value, they are more likely to cause
cache misses; and so on. Most of these factors are inherent to any GC and
our experience of using and working on \boehm -- a mature, widely used GC -- does
not suggest that major optimisation potential remains. In other words, it is unlikely that \boehm could be
sufficiently optimised to overcome the performance penalty of finalisers we observe.

Instead, whenever possible, \ourgc \emph{elides} finalisers so that they do not need to be run at all.
We are able to do this because many Rust destructors
solely do work that a conservative GC does anyway --- when used as finalisers,
such destructors are thus unnecessary and can be elided.

Consider the type \lstinline{Box<T>} which heap allocates space for a value;
when a \lstinline{Box<T>} value is dropped, the heap allocation will be freed
by \lstinline{Box}'s drop method. In a naive implementation,
\lstinline{Gc<Box<T>>} will, when no GC values remain, run a finaliser which
then runs \lstinline{Box}'s destructor.

Finaliser elision rests on two observations. First,
\lstinline{Box<T>}'s drop method solely consists of a call to \lstinline{free}\footnote{The
function in the Rust API is actually called \lstinline{deallocate}, but that
jars with the terminology we use elsewhere.}. Second, while we informally say
that \lstinline{Box<T>} allocates on the `heap' and \lstinline{Gc<T>} allocates
on the `GC heap', all allocations are made through \boehm and stored in
a single heap. Thus, when used as a finaliser,
\lstinline{Box<T>}'s drop method is unneeded\footnote{There is a subtle asymmetry here: we
cannot elide the destructors themselves, as they are still necessary to ensure
predictable destruction in non-\lstinline{Gc<T>} Rust code.}, as the underlying memory will
naturally be freed by \boehm anyway\footnote{Indeed, since the drop method only
calls \lstinline{free}, any resulting finaliser would cause the underlying
allocation to live longer than if the finaliser was not present!}.

This means that there is no need to run a finaliser for a type such as
\lstinline{Gc<Box<u8>>} and we can statically elide it. However, we can not
elide every \lstinline{Gc<Box<T>>} finaliser: for example, \lstinline{Gc<Box<Rc<u8>>}
requires a finaliser because \lstinline{Rc<T>} needs to decrement a reference
count. This may seem confusing, because in both cases \lstinline{Box<T>}'s drop
method is the same: however, the drop glue added for \lstinline{Box<Rc<u8>}
causes \lstinline{Rc<T>}'s destructor to be run.


\subsection{Implementing finaliser elision}

\begin{algorithm}
\caption{Finaliser Elision}
\label{alg:elision}
\begin{algorithmic}
\Function{RequiresFinalizer}{$T$}
    \If {\Call{Impls}{$T$, $Drop$} \AND \NOT \Call{Impls}{$T$, $DropMethodFinalizerElidable$}}
        \State \Return{true}
    \EndIf
\ForEach {$field \in type$}
    \If{\Call{RequiresFinaliser}{$field$}}
        \State \Return{true}
    \EndIf
\EndFor
\State \Return{false}
\EndFunction
\end{algorithmic}
\end{algorithm}

The aim of finaliser elision is to statically determine which type's destructors do
not require corresponding finalisers and elide them. Notably, finaliser elision
must deal correctly with nested data types.
The high-level algorithm is shown in~\cref{alg:elision}.

In essence, the algorithm has to consider three things. First, types that do not
implement the \lstinline{Drop} trait are clearly candidates for elision. Second,
some types that implement the \lstinline{Drop} trait are also candidates
for elision (e.g.~\lstinline{Box<T>}), but these cannot be automatically determined,
as it would require fully understanding the logic inside the type's drop method.
The user must manually signify to \ourgc by the user implementing the new
unsafe \emph{marker trait} (i.e.~a trait without methods that \ourgc can recognise)
\lstinline{DropMethodFinalizerElidable}:

\begin{lstrustsmall}
unsafe impl<T> DropMethodFinalizerElidable for Box<T> {}
\end{lstrustsmall}

\lstinline{DropMethodFinalizerElidable} requires a careful understanding of a
type's drop method: incorrectly implementing this trait will change a program's
semantics.

Third, we must recursively check a type's fields to see if any require
finalisation --- in other words, we check what the drop glue for a type will do.
A type \lstinline{T<U>} might implement the
\lstinline{DropMethodFinalizerElidable} trait, but if its type parameter \lstinline{U} does
not, and \lstinline{U} implements the \lstinline{Drop} trait, then
\lstinline{T<U>} must have its finaliser run.

\ourgc modifies the standard Rust library to implement
\lstinline{DropMethodFinalizerElidable} on the following types: \lstinline{Box<T>},
\lstinline{Vec<T>}, \lstinline{RawVec<T>}, and \lstinline{HashMap<T>}. As these
types are widely used in real Rust code, this often leads to significant
numbers of finalisers being elided.

\begin{figure}[t]
  \begin{lstlisting}[language=Rust, numbers=none,
    label={listing:elision_in_rustc}, caption={
      Finaliser elision inside \ourgc itself. A new \lstinline{const}
      (i.e.~compile-time evaluated) compiler intrinsic
      \lstinline{requires_finalizer} returns true if a finaliser is required for a
      type. The \lstinline{Gc<T>} type uses this intrinsic to ensure that the
      value is registered as requiring a finaliser.
    }]
impl<T> Gc<T> {
  pub fn new(value: T) -> Self {
    if requires_finalizer::<T>() { Gc<T>::new_with_finalizer(value) }
    else { Gc<T>::new_without_finalizer(value) }
    ...
  }
}
\end{lstlisting}
\end{figure}

\cref{listing:elision_in_rustc} shows how finaliser elision is used inside the compiler. \ourgc
introduces a new \lstinline{const} compiler intrinsinc
\lstinline{requires_finalizer} (based on~\cref{alg:elision}) which returns true if a finaliser is required for a
type. \lstinline{Gc<T>}'s constructor uses this intrinsic to decide whether a
finaliser must be registered or not. Because \lstinline{requires_finalizer} is
evaluated at compile-time, it can be inlined into \lstinline{Gc::new},
allowing the associated conditional to be removed too. In other words --
compiler optimisations allowing -- the `does this specific type require a
finaliser' checks have no run-time cost at all.


\section{Finaliser Safety Analysis}

In this section we address two problems: running finalisers on the same thread
as a paused collector can cause deadlocks; and some safe destructors are not
safe finalisers. Addressing the first problem is conceptually simple --
finalisers must be run on a separate thread -- but we must ensure that doing so
is sound. We therefore consider this a specific instance of the second problem.

Our solution is the novel technique of
\emph{finaliser safety analysis}. We extend Rust's static analyses, and alter the relevant parts
of \rustc, to reject a type \lstinline{T}'s destructor if it
is not provably safe to be used as a finaliser in a \lstinline{Gc<T>}.
To the best of our knowledge, no other GCed language has an
equivalent of finaliser safety analysis: indeed, we rely on some of Rust's less
common static guarantees.

In this section we first define the ways that reusing destructors as finalisers
would undermine safety, and the general rules for identifying the troublesome
cases. We then explain how we changed \rustc to implement these rules in \ourgc.


\subsection{Rust references}

\begin{figure}[t]
\centering
\begin{minipage}{0.50\textwidth}
\lstinputlisting[language=Rust,firstline=9,lastline=18]{listings/dangling_reference.rs}
\subcaption{}
\end{minipage}%
\begin{minipage}{0.50\textwidth}
\begin{lstlisting}[language=Rust, numbers=none]
error: `Node{value: &*b, nbr: None}` cannot be safely
       constructed.
 |  let gc1 = Gc::new(Node{value: &*b, nbr: None});
 |            --------^^^^^^^^^^^^^^^^^^^^^^^^^^^-
 |            |       |
 |            |       contains a reference (&) which
 |            |       may no longer be valid when it
 |            |       is finalized.
 |            `Gc::new` requires that a type is
 |            reference free.
\end{lstlisting}
\subcaption{}
\end{minipage}
\label{fig:dangling_reference}
\captionof{lstlisting}{A \lstinline{u64} is
  placed on the heap using a \lstinline{Box}, and an immutable reference to it
  is stored inside the \lstinline{Gc<Node>}. Without finalisation, this is
  valid as long as \lstinline{gc1} does not outlive \lstinline{b}. However, the
  use of the references from the \lstinline{drop} method is unsound. This is
  because the collector is likely to schedule the finaliser to run after the
  \lstinline{Box<u64>} is freed when \lstinline{b} goes out of scope. The
  dereference on line 3 would thus access a dangling reference causing a
  use-after-free.\laurie{this example slightly contradicts the text: is the
  problem the existence of reference + finaliser OR the use of the reference in
  drop? I'm pretty sure it's the first of these two possibilities, but
  I'd like to check!}}
\end{figure}

A \lstinline{Gc<T>} can store (indirectly or indirectly) normal Rust references
(i.e.~\lstinline{&} and \lstinline{&mut}), at which point the \lstinline{Gc<T>}
is subject to Rust's normal borrow checker rules and cannot outlive the
reference. Finalisers are fundamentally incompatible with this, because the
finaliser will be run after the \lstinline{Gc<T>}'s owner has gone out of scope:
if the finaliser were to make use (even indirectly) of a reference, we would
have subverted the borrow checker.

\ourgc's solution to this is simple: \lstinline{Gc<T>} can only include a
reference iff \lstinline{T} does not need finalising. For example,
\lstinline{Gc<&u8>} is permitted, because integers do not need finalising. However,
\lstinline{Gc<Node<&Box<T>>>} is not permitted, because \lstinline{Box} needs
its finaliser to be run in order to free its backing memory. \cref{fig:dangling_reference}
shows the error that \ourgc raises in such a case.

While we could somewhat relax this rule (e.g.~finalisers which provably do not,
directly or indirectly, use a reference could be let through), there are few
benefits to doing so: after all, \lstinline{Gc<T>} is explicitly designed to
help users avoid ownership restrictions. That said, finaliser elision
(see~\cref{XYZ}) does reduce the number of types which need finalisation, thus
enabling more types to be embedded as references, for those who really want to.

\laurie{this paragraph again suggests that we look into the finaliser, but
elsewhere we suggest we don't look into the finaliser.}
As with references, this same unsoundness can be caused by storing raw pointers
inside a \lstinline{Gc} (either immutable \lstinline{*const} or mutable
\lstinline{*mut}) and then dereferencing them in a finaliser. However, \ourgc
is sound even without enforcing this rule for raw pointers because dereferencing raw pointers
is already not possible in safe Rust code, and programmers must ensure the
reference is valid for each dereference anyway.

To check whether a type passed to \lstinline{Gc} contains a reference, \ourgc
defines a \emph{marker} (i.e.~without methods) \emph{auto trait}
(i.e.~automatically implemented for types which adhere to its restrictions)
\lstinline{ReferenceFree}. \ourgc automatically implements
\lstinline{ReferenceFree} for each type that does not itself contain a
type that is not \lstinline{ReferenceFree}. Primitive types such as integers
are \lstinline{ReferenceFree} by default; crucially (and at the risk of stating
the obvious) references themselves are not \lstinline{ReferenceFree}. Thus a
non-\lstinline{ReferenceFree} type `pollutes' its container type(s) in the
manner we desire\footnote{\laurie{``As with all auto traits?``}, users can
force a type to implement \lstinline{ReferenceFree}, but only in unsafe Rust.}.

\laurie{``this is then checked in the \lstinline{CheckCallSite} phase`` is that part of FSA or normal \rustc?}


\subsection{Finalisation Order}


% \subsubsection{Need to introduce concurrency}
% \label{sendsync}
%
% The Rust type system is able to guarantee statically that values are being used
% in a thread-safe way. It does this with the use of two special "marker" traits:
% \lstinline{Send} and \lstinline{Sync}.
%
% A value whose type implements the \lstinline{Send} trait can be transferred to
% other threads. Almost all types in Rust implement the \lstinline{Send} trait
% save for a few exceptions. One such exception is the \lstinline{Rc<T>}
% (reference counting) type. This does \emph{not} implement \lstinline{Send}
% because it does not perform count operations on its underlying contents
% atomically. If it were sent to another thread, it could race if both threads
% tried to update the count simultaneously. In contrast, the \lstinline{Arc<T>}
% type (an atomic implementation of \lstinline{Rc<T>}) does implement
% \lstinline{Send} provided its inner type \lstinline{T} is also \lstinline{Send}.
%
% The corollary to \lstinline{Send} is the \lstinline{Sync} trait, which can be
% implemented on types whose values perform mutation in a thread-safe manner. For
% example, a \lstinline{Mutex<T>} implements \lstinline{Sync} because it provides
% exclusive access to its underlying data atomically. On the other hand, a
% \lstinline{RefCell} (discussed in \autoref{intmut}) does not implement
% \lstinline{Sync} because it only provides single-threaded interior
% mutability. This is because its underlying mechanism to increment and decrement
% borrow counts are not performed atomically.
%
% The \lstinline{Send} and \lstinline{Sync} marker traits are a special kind of
% trait known as \emph{auto traits}. This means that they are automatically
% implemented for every type, unless the type, or a type it contains, has
% explicitly opted out. Types can be opted out via a \emph{negative impl}:

% \begin{lstrustsmall}
%     impl !Send for T {}
% \end{lstrustsmall}

% If, for example, a struct \lstinline{S} contains a field of type \lstinline{T}
% from the example above, then the entire struct \lstinline{S} will also not
% implement \lstinline{Send}.
%
% \lstinline{Send} and \lstinline{Sync} can be manually implemented on types, but
% doing so requires \lstinline{unsafe} Rust code since the user must guarantee it
% is safe to use in a multi-threaded context.


\subsection{Finalisation order}

Some GCs implementations guarantee a finalisation order, because for some
applications, this is important if one resource must be cleaned up before
another.

However, this guaranteed finalisation order has two disadvantages. First,
finalising all the objects in a chain of floating garbage happens over multiple
finalisation cycles because an object can only be finalised if it is not
reachable from other unreachable objects. Such a delay in the eventual
reclamation of objects can cause \emph{heap drag}, where unreachable objects
are kept alive longer than necessary.

Second, and most significantly, this approach is not able to finalise cycles of
objects where more than one object needs finalising, which can lead to resource
leaks. This is because the collector cannot know which (if any) object is safe
to finalise first: if an object references another object which has already
been finalised, this is unsound. Boehm proposes a workaround for this where
programmers can refactor the objects in order to break the finalisation
cycle~\citep{boehm03destructors}. Another workaround is to allow users to use
weak references to break cycles (similar to breaking reference count cycles).
Unfortunately this is often difficult to implement
correctly~\citep{jones16garbage} \laurie{we need a page reference for this}.

\ourgc uses an alternative approach where it does not specify any finalisation
order. This allows objects with cycles to be finalised but with a heavy
constraint: they must not reference other objects from inside their finalisers.
However, this restriction is not so bad for us because our requirements for GC
are unique in this respect: \ourgc is not intended to replace Rust's RAII
approach to memory management, instead, it provides optional GC for objects
where the RAII approach is difficult or impossible. It is not uncommon to see
Rust programs which use \ourgc with a mix of GC'd and non-GC'd objects. In such
cases, it is safe for a finaliser to access a field of a non-GC'd object
because there is no danger of them being finalised already.

In addition, one of \ourgc's main goals is to make it easier to work with data
structures that have cycles in Rust. It is suggested that finalisation cycles
are rare in GC'd languages~\citep{jones16garbage}. However, this is different for
\ourgc, since destructors in Rust are common, and mapping them to finalisers
means that it is not uncommon to see finalisation cycles in Rust programs using
\ourgc.

The problem with finalising \lstinline{Gc}s in an unspecified order is that any
other \lstinline{Gc} object which they reference may have already been freed.
Consider the example in \autoref{fig:unsound_finalisation_cycle}, which
allocates \lstinline{Gc}s which reference each other in a cycle. A cycle such as
this one will always lead to unsoundess with finalisers which access other
\lstinline{Gc} objects. This is also a problem for non-cyclic data structures in
\ourgc's non-ordered configuration as there is no guarantee that non-cyclic data
structures will be finalised `outside-in'.

\autoref{fig:unsound_cycle} shows an example of how dereferencing another
\lstinline{Gc} can be unsound.

\begin{figure}[t]
\centering
\begin{minipage}{0.50\textwidth}
\lstinputlisting[language=Rust,firstline=5]{listings/finalisation_cycle.rs}
\subcaption{}
\end{minipage}%
\begin{minipage}{0.50\textwidth}
\begin{lstlisting}[language=Rust, numbers=none]
error: `n1` cannot be safely finalized.
   |
4  | let nbr = self.nbr.unwrap();
   |           --------
   |           |
   |           caused by the expression in `fn drop(&mut)`
   |           here because it uses a type which is not
   |           safe to use in a finalizer.
...
11 | let gc1 = Gc::new(n1);
   |                   ^^ has a drop method which cannot
   |                      be safely finalized.
   |
   = help: `Gc` runs finalizers on a separate thread,
           so drop methods must only use values whose
           types implement `Send + Sync + FinalizerSafe`.
\end{lstlisting}
\subcaption{}
\end{minipage}
\label{fig:finalisation_cycle}
\captionof{lstlisting}{An example of finaliser safety analysis preventing an
  unsound Rust program. \textbf{(b)} showing the compile-time error message
  shown when attempting to compile the example in \textbf{(a)}. FSA identifies
  that another \lstinline{Gc} is being dereferenced inside a finaliser. This is
  unsound when \ourgc does not use ordered finalisation as it may have already
  been collected.}
\end{figure}


\subsubsection{Destructors Must Be \lstinline{Send}able To Another Thread}

It is common in many languages for finalisers to access fields from other
objects or even global state. Since an object's finaliser is run at some unknown
point in time once it is considered unreachable by the collector, it must be
able to safely access such state without racing with the mutator.

One possible solution might seem to be to prohibit finalisers from acquiring
locks, however this can still cause race-like bugs because of how finalisers can
interleave asynchronously with the mutator~\citep{niko13destructors}.

In addition, \ourgc schedules all finalisers to run on a separate thread to the
mutator. As with any GC, \ourgc can finalise objects at its leisure, with no
way for the programmer to know when this will happen. Rust does not prevent
users from writing code which deadlocks, but it does not make the situation
worse. However, if \ourgc were to perform on-thread finalisation, it would open
the possibility of previously non-deadlocking code deadlocking.

\begin{figure}[t]
\lstinputlisting[language=Rust, firstline=10]{listings/finaliser_deadlock.rs}
\captionof{lstlisting}{An example showing how a potential deadlock can be
  caused if finalisers are run on the mutator thread. A shared counter is
  created using a reference counted container, a reference to this is then
  placed inside a garbage collected container. This is potentially short-lived,
  as it is only reachable by the \lstinline{gc} variable until the end of the
  inner scope. If \ourgc decides to schedule a collection after this then the
  \lstinline{Rc} could be considered garbage, where a finaliser would run its
  drop method (line 10). If this happens while the main mutator thread already
  holds \lstinline{counter.lock()} (line 32-36) then this program can deadlock.
  When finalisers are run on the same thread, there is nothing the programmer
  can do in this situation to guarantee that this kind of deadlock does not
  occur.}
\label{fig:finaliser_deadlock}
\end{figure}

Unfortunately, finalising objects off-thread now means that shared data, or other
objects accessed from a finaliser must be done in a thread-safe way. The problem
with this is that in \ourgc, a finaliser calls a type's existing drop methods.
Since \lstinline{Drop} was not originally defined in expectation of being
called on a separate thread, it does not guarantee thread safety.


\subsection{The Analysis}
\label{sec:fsa}

Extends Rust's type system to reject programs whose destructors are not
provably safe to be used as finalisers. At compile-time, a static analysis is
performed over the drop methods of each value used in a \lstinline{Gc}.


\subsubsection{Ensuring thread-safety}

The basic idea is that whenever a type is used in a \lstinline{Gc}, that type's
drop method needs to be checked to ensure it only accesses fields which are
safe to be used inside a finaliser. Performing this check only when such types
are used in \lstinline{Gc} is important as it prevents FSA from breaking
existing Rust programs: drop methods with unsound finalisation behaviour are
not a problem if they are never used in a \lstinline{Gc}.

\begin{figure}[t]
\centering
\begin{minipage}{0.50\textwidth}
\lstinputlisting[language=Rust,firstline=9,lastline=18]{listings/thread_unsafety.rs}
\subcaption{}
\end{minipage}%
\begin{minipage}{0.50\textwidth}
\begin{lstlisting}[language=Rust, numbers=none]
error: `n1` cannot be safely finalized.
    |
11  | let gc1 = Gc::new(n1);
    |                   ^^ has a drop method which cannot
    |                      be safely finalized.
    |
   ::: ../rc.rs:368:18
    |
368 |         unsafe { self.ptr.as_ref() }
    |                  --------
    |                  |
    |                  caused by the expression in `fn drop(&mut)` here because
    |                  it uses a type which is not safe to use in a finalizer.
    |
    = help: `Gc` runs finalizers on a separate thread, so drop methods
            must only use values whose types implement `Send + Sync + FinalizerSafe`.
\end{lstlisting}
\subcaption{}
\end{minipage}
\label{fig:dangling_reference}
\captionof{lstlisting}{A \lstinline{u64} is
  placed on the heap using a \lstinline{Box}, and an immutable reference to it
  is stored inside the \lstinline{Gc<Node>}. Without finalisation, this is
  valid as long as \lstinline{gc1} does not outlive \lstinline{b}. However, the
  use of the references from the \lstinline{drop} method is unsound. This is
  because the collector is likely to schedule the finaliser to run after the
  \lstinline{Box<u64>} is freed when \lstinline{b} goes out of scope. The
  dereference on line 3 would thus access a dangling reference causing a
  use-after-free.}
\end{figure}

The \lstinline{Wrapper} uses a \lstinline{RefCell} to swap the value of
underlying string (line 11). A \lstinline{RefCell} provides a form of interior
mutability which is not thread-safe (because its \lstinline{borrow()} /
\lstinline{borrow_mut()} methods are non-atomic). In this example, a
\lstinline{Wrapper} can safely be placed inside a \lstinline{Gc}, because the
\lstinline{RefCell} is not used in \lstinline{Wrapper}'s finaliser (line 4).
This is checked by FSA at compile-time.

As with any static analysis, FSA is inherently conservative: some drop methods
are impossible to analyse at compile-time so in these cases \ourgc will err on
the side of caution and reject potentially safe programs. This is most likely
to happen in two situations.

First, a drop method may contain a call to an opaque (i.e. externally linked)
function for which the compiler does not have the MIR. If a reference to a field
which would be unsafe to use in a finaliser is passed to this function, then FSA
would reject the program. This function call \emph{could} be safe, but FSA has
no way of knowing, and will reject it.

Second, if a finaliser is used on a trait object which is called using dynamic
dispatch in Rust. At compile-time, it is not possible to know the concrete type
of a trait object, so FSA does not know which drop method to check.

In both cases, the user has the option of explicitly informing the compiler a
particular drop method is safe to use as a finaliser. We observe that this is
rare in practice.

\subsubsection{Making unordered finalisation sound}
\label{sound_unordered_finalisation}

When \ourgc is compiled with unordered finalisation, it prevents drop methods
from dereferencing fields which point to other \lstinline{Gc} objects.
\autoref{fig:unsound_finalisation_cycle_error} shows the error message that is
displayed when the example in \autoref{fig:unsound_finalisation_cycle} is
compiled. It has identified that the user is trying to access field \lstinline{b},
which contains a \lstinline{Gc} type. I extend \ourgc's \emph{finaliser safety
analysis} (described in \autoref{fsa}) to detect this.

\begin{figure}[t!]
\begin{lstlisting}
error: `Mutex::new(Node {
               name: String::from("a"),
               next: None,
           })` cannot be safely finalized.
  --> src/main.rs:30:21
   |
15 |               Some(n) => {
   |                    -
   |                    |
   |                    caused by the expression here in `fn drop(&mut)` because
   |                    it uses another `Gc` type.
...
30 |       let b = Gc::new(Mutex::new(Node {
   |  _____________________^
31 | |         name: String::from("a"),
32 | |         next: None,
33 | |     }));
   | |______^ has a drop method which cannot be safely finalized.
   |
   = help: `Gc` finalizers are unordered, so this field may have already been dropped.
     It is not safe to dereference.

\end{lstlisting}
    \caption{\jake{todo, replace this example}The compile-time error message shown when attempting to compile the
    example in \autoref{fig:unsound_finalisation_cycle}. FSA identifies that
    another \lstinline{Gc} is being dereferenced inside a finaliser. This is
    unsound when \ourgc does not use ordered finalisation as it may have already
    been collected.}
    \label{fig:unsound_finalisation_cycle_error}
\end{figure}

This is implemented by first adding a negative implementation of
\lstinline{FinalizerSafe} to the \lstinline{Gc<T>} type:

\begin{lstrustsmall}
impl<T> !FinalizerSafe for Gc<T> {}
\end{lstrustsmall}

FSA then uses this to identify the specific field which was unsafely
dereferenced and will generates an error message different from those to do with
thread-safety.

However, as explained in \autoref{fsa}, FSA is not complete. It is possible that
a drop method could dereference a \lstinline{Gc} field in a way that FSA could
not detect e.g.~by doing so behind an opaque function call. In such cases
where the MIR for the entire drop method cannot be checked, FSA will err on the
side of caution, favouring soundness by refusing to compile the program.

\subsubsection{How it works}

The first stage of FSA is to identify calls to \lstinline{Gc::new}\footnote{In
\ourgc, a \lstinline{Gc} object can only be created through the
\lstinline{Gc::new} constructor. We mark the definition of this function with a
special label, known as a \emph{diagnostic label}, so that it can be easily
referred to during the FSA phase of Rust compilation.}:

\begin{algorithm}
\begin{algorithmic}
\Function{FinaliserSafetyAnalysis}{$prog$}
    \ForEach {$mir\_body \in prog$}
        \ForEach {$basic\_blocks \in mir\_body$}
            \ForEach {$block \in basic\_blocks$}
                \If {\Call{IsCallToGcConstructor}{$block.terminator$}}
                    \State \Call{CheckCallsiteForDropImpl}{$block.terminator$}
                \EndIf
            \EndFor
        \EndFor
    \EndFor
\EndFunction
\end{algorithmic}
\end{algorithm}

This checks every statement in the MIR for a call to \lstinline{Gc<T>::new}
constructor. If found, we check if \lstinline{T} implements \lstinline{Drop}. If
it does, then \lstinline{Gc<T>} needs finalising, so the MIR body for
\lstinline{T}'s drop method is checked for soundness violations. FSA only
considers a drop method sound if fields which are dereferenced implement the
\lstinline{FinalizerSafe} trait. In Rust's MIR terminology,
such a field access would constitute a \emph{place projection}, where a place is
a memory location (or lvalue), and a projection is a field access. A single MIR
statement can contain more than one field projection
(e.g.~\lstinline{self.foo.bar.baz}).

This part of the analysis happens in the \lstinline{CheckCallsite} function:

\begin{algorithm}
\begin{algorithmic}
\Function{CheckCallsiteForDropImpl}{$callsite$}
    \State $arg\_ty \gets$ \Call{GetTypeOfFirstArg}{$callsite$}
    \If { \NOT \Call{Impls}{$arg\_ty$, $Drop$}}
        \State \Return
    \EndIf
    \State $drop\_body \gets$ \Call{GetDropMirBody}{$arg\_ty$}
    \ForEach {$basic\_blocks \in drop\_body$}
        \ForEach {$block \in basic\_blocks$}
            \ForEach {$statement \in block$}
                \If {\Call{HasPlaceProjection}{$statement$}}
                    \ForEach {$projection \in statement$}
                        \State \Call{CheckProjection}{$projection$}
                    \EndFor
                \EndIf
            \EndFor
        \EndFor
    \EndFor
\EndFunction
\end{algorithmic}
\end{algorithm}

If a projection is found, its type is checked for an implementation of the
\lstinline{FinalizerSafe} trait:

\begin{algorithm}
\begin{algorithmic}
\Function{CheckProjection}{$projection$}
    \Comment{A projection elem is the RHS of a field access.}
    \State $projection\_ty \gets$ \Call{GetType}{$projection.elem$}
    \If{ \NOT \Call{Impls}{$arg\_ty$, $FinalizerSafe$}}
        \State \Return{\Call{Error}{}}
    \EndIf
\EndFunction
\end{algorithmic}
\end{algorithm}

If \lstinline{CheckProjection} discovers a field access of a field which does
not implement \lstinline{FinalizerSafe}, it will throw a compiler error. This
does not halt the analysis, so multiple lines in drop methods which perform
unsound field accesses will be caught in a single FSA pass.


\section{Early Finaliszer Prevention}

Unlike RAII-like destructors found in languages such as C++ and Rust, finalisers
are called by a garbage collector non-deterministically. They can run at the
collector's leisure; often this means that they run later than desired, however,
in rare cases, an object can be finalised while it is still being used by the
mutator! This is because compiler optimisations -- unaware of the presence of a
GC -- can remove the single reference to an object which is keeping it alive. An
outer object can therefore be considered unreachable while its inner object is
still in use. An unfortunately timed GC cycle could  end up finalising the outer
object, and run its finaliser. This can lead to subtle races in programs where
the finaliser interleaves execution with the mutator.

For this reason, VM specifications do not commit to running finalisers at a
specific time. This includes allowing an object's finaliser to be run while the
mutator is potentially still using it. For the reasons outlined in
\autoref{bg:synchronisation}, GC implementations must synchronise access to
objects inside a finaliser. \citet[p~.218]{jones16garbage} suggests this can be
used to defer finalisation until later if a finaliser attempts to acquire an
object lock which is already held by the mutator.

More generally, however, the fundamental problem is that the compiler optimises
away the reference to the object too soon. C\#'s .NET runtime
provides a \lstinline{gc.KeepAlive} function as a solution to this.
\lstinline{gc.KeepAlive} is an opaque empty function which the compiler cannot
optimise away. The idea is that a reference to an object can be passed to
\lstinline{gc.KeepAlive}, ensuring it lives long enough so that the collector
does not deem it unreachable and finalise it too soon. This mitigation is
limited, however, as it is up to the user to call \lstinline{gc.KeepAlive} when
they require it.

A fundamental assumption in Rust's destructor semantics is that dropping a value
is the last thing to happen to it. The Rust compiler prevents using a value
after it has been dropped as this would cause unsoundness. For \lstinline{Gc}
values in \ourgc, the same must be true for finalisers. If a finaliser is able
to run before the mutator has finished using it, this would also be
unsound.

In \autoref{bg:early_finalisation}, I explain how finalisers can run earlier
than expected because compiler optimisations -- unaware of the presence of a GC
-- can cause GC objects to become unreachable earlier than expected. If this is
paired with an unfortunately timed GC cycle, the object's finaliser could run
while the object is still in use by the mutator. Rust and \ourgc is no
different: the Rust compiler is allowed to perform any optimisation that does
not change the observable behaviour of the program, and such optimisations are
not aware of the retro-fitted collector.

A finaliser which runs early can cause finalisation code to interleave
unexpectedly with the mutator~\citep{boehm03destructors}. But, even worse, in \ourgc early
finalisation can even lead to a memory safety violation. Consider the following
example, which shows how a finaliser which runs early could cause a
use-after-free:

\begin{lstlisting}[
  language=Rust,
  caption={An example of unsoundness caused by a finaliser running earlier than expected.},
  label={fig:early_finaliser_unsoundness_example}]
fn main()  {
    let root = Gc::new(Box::new(123));
    let inner: &usize = &**root;

    GcAllocator::force_gc();
    thread::sleep(time::Duration::from_secs(1));

    // Invalid read
    read(inner);
}
\end{lstlisting}

Assuming that the compiler clobbers the reference the \lstinline{Gc} stored in
variable \lstinline{a}, this program can be represented as follows:

\begin{center}
\includegraphics[width=0.75\textwidth]{images/early_finalisation}
\end{center}

In this program, semantically, both \lstinline{a} and \lstinline{box_ptr} live
until the end of \lstinline{main}. However, the compiler may decide to reuse the
register holding the reference at \lstinline{a} any time after line 3 as it is
no longer used. An unfortunately timed GC cycle which happens immediately
afterwards would consider the \lstinline{Gc} object unreachable. Its finaliser
will then be run, freeing the Box. This would happen even though there is still
a reference (\lstinline{box_ptr}) to the inner \lstinline{Box} value. This
reference is now a dangling reference, and its use on line 10 would constitute a
use-after-free.

The possibility of early finalisation has led many VMs to specify that finalisers
can happen at any time -- even earlier than when an object becomes unreachable
(see \autoref{bg:early_finalisation}). One way of preventing early
finalisation in \ourgc would be to prevent \lstinline{Gc}
objects from owning non garbage collected objects, but this would render
\ourgc almost unusable. Fortunately, we can do better.

\subsection{Solution (2): Early Finaliser Prevention}

Inserts barriers that prevent optimisations or register allocation from
`tricking' the GC into collecting values before they are dead, but does so in a
way that obviously pointless barriers are elided.

\ourgc takes advantage of two observations: Rust already inserts calls to \lstinline{drop} at
the same point in a function where we want to insert compiler barriers;
and we only need to insert barriers for variables of type \lstinline{Gc}.
However, since \lstinline{Gc} is a \lstinline{Copy} type, Rust prevents
us from adding a \lstinline{drop} method to \lstinline{Gc}.

Fortunately, since \ourgc already alters the Rust compiler, it is easy
to add a further modification. I thus modify the Rust
compiler to allow for simultaneous implementation of \lstinline{Copy} and
\lstinline{Drop} for \lstinline{Gc} types only, with the following drop
implementation:

\begin{lstrustsmall}
impl<T: ?Sized> Drop for Gc<T> {
    fn drop(&mut self) {
        unsafe {
            COMPILER_BARRIER(self)
        }
    }
}
\end{lstrustsmall}

The \lstinline{COMPILER_BARRIER(a)} includes inline assembly using Rust's
\lstinline{asm!} macro to create a read of the \lstinline{Gc}'s \lstinline{self}
reference after a compiler barrier. This is platform specific: for x86 it
translates to the following:

\begin{lstlisting}
asm("":::"memory")
\end{lstlisting}

Although the compiler barrier does not contain platform instructions, its
format is still platform dependent: other platforms such as AArch64 may
require a slightly different \lstinline{asm} statement.

However, the compiler barrier by definition prevents the compiler from
performing some of its normal optimisations --- it is an expensive solution to a
rare problem. In our performance analysis, this had roughly a 2-3\% slowdown. In
\autoref{optimising_early_finalisers} I describe how I optimise this approach,
removing barriers where it can be statically determined that they are
unnecessary.

Early finalisation prevention (\autoref{early_finaliser_prevention})
overapproximates the places where early finalisation can happen, which can have
a significant impact on performance. Fortunately, the finaliser elision
optimisation in \autoref{finaliser_elision} shows that many finalisers never
need to be called, at which point we also no longer have to worry early
finalisation! Where this is the case, we are able to remove the drop method for
the \lstinline{Gc<T>} pointers which contain compiler barriers during
compilation.

All \lstinline{Gc} values have drop methods with barriers by default. During
compilation, barriers which we can prove are unnecessary are removed. This is
done once the Rust compiler has generated its mid-level IR (MIR). Like finaliser
safety analysis (\autoref{fsa}), we perform an in-order traversal on the control
flow graph represented by the MIR for each function with the following
algorithm:

\begin{algorithm}
\begin{algorithmic}
\Function{BarrierRemoval}{$callsite$}
    \ForEach {$mir\_body \in prog$}
        \ForEach {$basic\_blocks \in mir\_body$}
            \ForEach {$block \in basic\_blocks$}
                \If {\Call{CallsDrop}{$block.terminator$}}
                    \State $arg \gets$ \Call{GetFirstArg}{$block.terminator$}
                    \State $arg\_ty \gets$ \Call{GetType}{$arg$}
                    \If {\Call{IsGC}{$arg\_ty$}}
                        \If {\NOT \Call{NeedsFinaliser}{$arg\_ty$}}
                            \State \Call{RemoveDrop}{$projection$}
                        \EndIf
                    \EndIf
                \EndIf
            \EndFor
        \EndFor
    \EndFor
\EndFunction
\end{algorithmic}
\end{algorithm}

This iterates over all drop methods in the entire program, identifying those
which belong to a \lstinline{Gc<T>}. If found, the drop call is removed if the
Gc reference points to an object which does not need finalising. The drop method
is removed by patching the terminator of the block which calls drop with the
terminator at the end of the drop body:

\begin{algorithm}
\begin{algorithmic}
\Function{RemoveDrop}{$block$}
    \State $drop\_mir \gets$ \Call{GetDropMirBody}{$block.terminator$}
    \State $last\_block \gets$ \Call{GetLastBlock}{$drop\_mir$}
    \State $block.terminator \gets last\_block.terminator$
\EndFunction
\end{algorithmic}
\end{algorithm}

After this pass, we call the Rust compiler's existing \emph{simplify mir} pass,
which tidies up the control flow graph by removing the empty blocks which were
created as a result of drop removal.


\section{Lots of stuff}

\ourgc introduces a new smart pointer type, \lstinline{Gc<T>}, which provides
shared ownership of a value of type \lstinline{T} allocated in the heap and
managed by a garbage collector. Consider a simple example and its corresponding
representation in memory:

\begin{minipage}[c]{0.5\linewidth}
\begin{lstrustsmall}
use std::gc::Gc;

fn main() {
    let a = Gc::new(123);
}
\end{lstrustsmall}
\end{minipage}
\begin{minipage}[c]{0.5\linewidth}
    \includegraphics[width=1\textwidth]{images/alloy_basic_gc_1}
\end{minipage}

This creates a garbage collected object which contains the \lstinline{u64} value
\lstinline{123}. A \lstinline{Gc}'s data is stored in a \lstinline{GcBox}
internally. \lstinline{GcBox}es are managed by the collector, though this is not visible to the
user.

\lstinline{Gc} references are copyable (i.e.~they implement the \lstinline{Copy}
trait), with copied references pointing to the same object in the heap:

\begin{minipage}[c]{0.5\linewidth}
\begin{lstrustsmall}
fn main() {
    let a = Gc::new(123);
    let b = a;
}
\end{lstrustsmall}
\end{minipage}
\begin{minipage}[c]{0.5\linewidth}
    \includegraphics[width=1\textwidth]{images/alloy_basic_gc_2}
\end{minipage}

This makes \lstinline{Gc} more ergonomic to use than \lstinline{Rc}, as there is
no need to call \lstinline{clone} on a \lstinline{Gc} to obtain another
reference to its data.

The \lstinline{GcBox} referenced by a \lstinline{Gc} is guaranteed not to be
freed while there are still references to it. When there are no longer any
references, the collector will reclaim it at some point in the future. The
garbage collector runs intermittently in the background, so \lstinline{Gc}
objects may live longer than they need to.

\subsubsection{Dereferencing}

A \lstinline{Gc<T>} dereferences to \lstinline{T} with the dereference
(\lstinline{*}) operator:

\begin{minipage}[c]{0.5\linewidth}
\begin{lstrustsmall}
fn main() {
    let a = Gc::new(123);
    let b = *a;
    foo(b);
}

fn print(int: u64) {
    println!("{}", int);
}
\end{lstrustsmall}
\end{minipage}
\begin{minipage}[c]{0.5\linewidth}
    \includegraphics[width=1\textwidth]{images/alloy_basic_gc_3}
\end{minipage}

Here, the value can be copied out of the \lstinline{Gc} into \lstinline{b}
because \lstinline{u64}s are copyable. The \lstinline{Gc} type also allows the
dot operator to be used for calling methods of type \lstinline{T} on a
\lstinline{Gc<T>}:

\begin{lstrustsmall}
struct Wrapper(u64);

impl Wrapper {
    fn foo(&self) {
        ...
    }
}

fn main() {
    let a = Gc::new(Wrapper(123));
    a.foo();
}
\end{lstrustsmall}

\subsection{Mutation}

There is no way to mutate, or obtain a mutable reference (\lstinline{&mut T})
to a \lstinline{Gc<T>} once it has been allocated. This is because mutable
references must not alias with any other references, and there is no way to know
at compile-time whether there is only one \lstinline{Gc} reference to the data.

As with other shared ownership types in Rust, interior mutability
(\autoref{intmut}) must be used when mutating the contents inside a
\lstinline{Gc}:

\begin{lstrustsmall}
fn main() {
    let a = Gc::new(RefCell::new(123));
    *a.borrow_mut() = 456; // Mutate the value inside the GC
}
\end{lstrustsmall}


\subsubsection{Conservative GC}

\label{ourgc:soundness}

\ourgc is a conservative GC (\autoref{bg:css}), which means that by nature, it
is unsound. This is because, technically speaking, the way conservative GC works
violates the rules of most languages, most compilers, and most operating
systems. In very rare cases, compilers have been known to perform
optimisations which can obfuscate pointers from the
collector~\citep{chromium20cssbug}. Fortunately, the ubiquity of conservative GCs in industrial strength
VMs means that in practise it is well supported. If one can accept this caveat,
\ourgc is otherwise correct-by-design provided that users do not hide,
accidentally or otherwise, pointers from the GC. Programming techniques which
rely on \emph{pointer obfuscation}~\citep{boehm96simple} are therefore not
allowed in \ourgc. This rules out the use of certain data structures such as XOR
lists.

When \ourgc performs a collection, it must first identify the roots from which
the rest of the object graph is traced. Such roots exist on the call stack, in
registers, and in segments of the program which store global values. When a
collection is scheduled, the BDWGC spills register values to the stack so that
their contents can be scanned for pointers along with the rest of the
stack~\citep{boehm88garbage}. The call stack is exhaustively examined for
possible pointers to instances of objects, with each aligned word on the stack
is checked to see whether it points to an instance of an object: if it does,
that object is considered a root. \autoref{fig:roots} shows an example of what
\ourgc considers roots to garbage collected objects.

\begin{figure}
% Hack to get the listing in the figure
\newsavebox{\rootlisting}
\begin{lrbox}{\rootlisting}
\begin{lstrustsmallnonums}
// `a` exists on the stack.
let a = Gc::new(1);
let b = a; // obtain copy


let c = Gc::new(Gc::new(2));
// obtain a rust (&) ref
let d = c.as_ref();
\end{lstrustsmallnonums}
\end{lrbox}
    \centering
    \subfloat[\centering Roots on the stack]{\usebox{\rootlisting}}
    \hfill
    \subfloat[\centering Representation in memory]{
        \includegraphics[width=0.5\textwidth]{images/gc_roots.pdf}
    }
    \caption{An example showing values on the stack which are considered
    roots to \lstinline{GcBox}es.}
    \label{fig:roots}
\end{figure}

\begin{figure}[t]
\begin{lstrustsmall}
use std::gc::GcAllocator;

#[global_allocator]
static ALLOCATOR: GcAllocator = GcAllocator;

fn main() {
    ...
}
\end{lstrustsmall}
    \caption{Setting the global allocator to use the BDWGC in \ourgc using
    Rust's \lstinline{global_allocator} attribute.}
\label{fig:allocator}
\end{figure}

\subsubsection{Garbage collected objects in other heap objects}

In \ourgc, references to garbage collected objects can be stored in
traditional, non-garbage-collected Rust objects:

\begin{lstrustsmall}
fn main() {
    let v = Vec::new();
    v.push(Gc::new(1));
    v.push(Gc::new(2));
}
\end{lstrustsmall}

Here, the vector contains two references to garbage collected objects which are
managed by \ourgc. Even though the vector itself is not garbage collected, its
backing store must still be traced during a collection in order to locate GC
objects. To support this, every allocation in a \ourgc must use the BDW
allocator -- even those which are not garbage collectable. This is because the
BDW allocator stores bookkeeping information such as mark bits and the memory
block size which are needed during a collection.


Rust provides a convenient way to set the global allocator for a program, and
because \ourgc extends the standard library to include the BDW allocator,
programs can easily be made \ourgc-compliant. This is shown in
\autoref{fig:allocator}.

This ensures that every heap allocation (except those created using a
\lstinline{Gc::new()}) is made using the BDW allocator's
\lstinline{GC_malloc_uncollectable} function. This allocates a
non-garbage-collected block which the collector is aware of and can scan for
pointers to other garbage collected objects. As with the call stack, the BDWGC
scans all allocated blocks in memory that are reachable from the root-set
conservatively word-by-word.

\subsubsection{Pointer obfuscation}

\label{ourgc:pointer_obfuscation}

As a systems programming language, Rust permits operations directly on pointers.
This includes: casting pointers to and from integer types; pointer arithmetic;
and bitwise operations on pointers. All three of these operations can be used to
obfuscate a pointer, hiding it from the collector and causing it to
erroneously determine that an object is unreachable. The user must not obfuscate
any pointers in this way.

\begin{figure}
\begin{lstrustsmall}
fn make_obfuscated() -> usize {
    let a = Gc::new(123_u64);

    // Get a raw pointer to the underlying allocation
    let aptr = a.as_ref() as *const u64;

    // Use the bitwise NOT operator to obfuscate the pointer
    return !(aptr as usize)
}

fn main() {
    let obf = make_obfuscated();

    ...

    // GC cycle here. The `Gc` is potentially unreachable!

    let reify = (!obf) as *const u64;

    // `unsafe` is needed to dereference a raw pointer
    let value = unsafe { *reify };
}
\end{lstrustsmall}
\caption{An example of how pointer obfuscation in Rust can hide a pointer from
    the collector. \ourgc uses the BDWGC to conservatively scan the stack, so
    this allocation could be missed if its only remaining reference is the one
    obfuscated on line 8. Fortunately, however, it requires an unsafe block to
    dereference (line 21).}
\label{lst:obfuscation}
\end{figure}

\subsubsection{Pointer casting and word alignment}

For \ourgc to be able to locate pointers during a marking, all references, raw
pointers, and machine-word sized integers (\lstinline{usize}) must be
word-aligned. This is because Boehm scans the stack and heap blocks for pointers
Word-by-word, so non-word-aligned values may be missed.

It's easy to see that \ourgc needs to be able to identify objects via references
or raw pointers, and thus requires them to be word-aligned. The reason this is
also true for \lstinline{usize}s is more subtle, and is necessary because it's
possible in Rust to cast between raw pointers and word-sized integers using the
\lstinline{as} keyword. Consider the following:

\begin{lstrustsmall}
let gc = Gc::new(Value);
let gc_ref = gc.as_ref(); // Get a &Value reference.
let ptr_to_int = (gc_ref as *const Value) as usize;
\end{lstrustsmall}

We first obtain a reference to the \lstinline{Value} stored inside the
\lstinline{Gc} before casting it to a raw pointer, and then a \lstinline{usize}
(a word-sized unsized integer). If \lstinline{gc} and \lstinline{gc_ref} were to
go out of scope, the \lstinline{ptr_to_int} is enough to keep the GC'd object
alive, because when \ourgc scans the stack, it would correctly identify that
\lstinline{ptr_to_int} looks like a pointer to a valid GC object.

\section{The collector}
\label{ourgc:collector}

\ourgc uses the Boehm-Demers-Weiser GC (\boehm) as the collector implementation
~\citep{boehm88garbage}. \boehm is a conservative mark-sweep collector. It
scans the Rust program's call stack conservatively to look for pointers to
garbage-collected objects when marking. The rest of this section describes the
necessary changes we made to \boehm.

\subsection{Disable parallel collection}

The BDWGC uses parallel collector threads for both mark and sweep phases. We
disable these for performance reasons because we found the performance impact
of the lock contention over heap allocation too high when using concurrent
mutator threads. \jake{TODO: explain}

\subsection{Thread-local storage support}

\boehm does not provide a way for us to scan thread-locals for pointers, se
provide a solution to this for both the POSIX thread-local \lstinline{specific}
API, and thread-locals which use fast compiler generated TLS.
\jake{Realistically, people will only use the latter. So it might not be worth
even mentioning the POSIX stuff}.

By default, Rust uses LLVM's TLS implementation where thread-local data is
stored in the \lstinline{PT_TLS} segment of the ELF binary. This must be considered part of
the root-set during collection. So that \boehm can scan this, we must be able
to locate the \lstinline{PT_TLS} block for each thread when they are suspended during GC,
and then add this to the range of values which need marking. Each thread that
is suspended during a collection is scheduled will call
\lstinline{dl_iterate_phdr(3)} to get the start and end of its own range in the
\lstinline{PT_TLS} segment. These ranges are then scanned during marking. We must do this
at each collection, rather than once at start-up because this segment can grow
and shrink dynamically during the course of the application, and threads can be
spawned or killed in-between collections.

\subsection{Off-thread finalisation API}

We introduces new malloc functions in \boehm for allocating \lstinline{Gc<T>}
objects which require finalising on a separate, dedicated finalisation thread
(\lstinline{GC_buffered_finalize_malloc}, \lstinline{GC_posix_memalign}, \lstinline{GC_memalign}).

These functions allocate objects where the first word in the object points to
their finaliser. (specifically, a fn pointer to \lstinline{drop_in_place} for
\lstinline{T} in \lstinline{Gc<T>}.). This fn pointer must also be tagged in
order to differentiate it from an empty block (as \boehm uses an optimisation
where the first word of empty blocks are used to create a threaded freelist
implementation) \jake{probably unnecessary detail}.

During the sweep phase of a collection, a pointer to each unreachable
finalisable object is added to a \emph{finalisation buffer}. A separate
finalisation thread goes through these objects in the buffer and finalises
them, before deallocating the buffer entirely. This thread is suspended as with
other mutator threads when a GC pause happens, so no synchronisation is needed
between adding objects to the buffer and processing objects already in the
buffer. This thread is spawned lazily depending on whether a finalisable object
exists. If a program contains no finalisable objects, no finalisation thread is
spawned.

\subsection{Parallel mutator threads}

\jake{This needs moving somewhere more sensible as it describes \boehm functionality as-is.}

In order to support thread-safe \lstinline{Gc}s in \ourgc, the BDWGC must be
able to scan each thread's call stack for roots. I extend the Rust compiler to
register newly spawned threads with BDWGC's collector, and to unregister them
when they are destroyed.

\ourgc relies on the BDWGC's signal spin implementation to come to a GC
safepoint. That is, when a mutator thread comes under allocation pressure and
needs to schedule a GC, the BDWGC will send a SIGPWR signal to each registered
thread and has them spin in a signal handler while the collection cycle takes
place.

The main disadvantage of this approach is that it makes use of
implementation-defined behaviour because it relies on the target OS's mechanism
for pausing threads. The BDWGC provides implementations for most platforms, but
it is not portable. An implementation where Rust inserts thread pause safepoints
at appropriate locations would largely solve these issues, though at
the expense of considerable implementation effort.

\section{Finaliser Safety Analysis}

The decision to make \ourgc a conservative GC was relatively simple. A much
more difficult question is: what should we do with drop methods and finalisers?
There are, broadly speaking, three possible design choices: (1) universally
call drop methods from finalisers and accept that this undermines soundness;
(2) require programmers to manually implement a finaliser for each type they
wish to GC and accept the resulting boilerplate; (3) analyse drop methods and
only allow those determined to be safe to be used as finalisers. To the
best of our knowledge, all current GCs for Rust that support finalisers
\laurie{do i remember correctly that some, maybe luster, don't support
finalisers?} \jake{Luster (now gc-arena) does support finalisation \lstinline{https://github.com/kyren/gc-arena/blob/64ab98785417dd8b82737e6c34a80fb6e0f46f87/src/arena.rs#L323}} use the second approach.

\ourgc introduces the novel concept of \emph{Finaliser Safety Analysis} (FSA)
as a way of realising the third approach. FSA extends Rust's type rules to
reject unsafe drop methods when used as finalisers, requiring the user to
either override the check, or to implement a separate finaliser. Since FSA is
designed to be sound (i.e.~without false positives), a practical challenge is
to make it accept enough safe drop methods to be useful. In this section we
explain the motivation for, design, and implementation in \rustc, of FSA.

\laurie{remember to mention auto-traits and type coherence here}

\jake{Expanding here on the limitations of our FSA approach by re-using Drop}
The major limitation of our approach to using \lstinline{Drop} as a GC
finaliser is that it can cause breaking changes for upstream crate authors who
were not aware their types' drop methods were being used downstream inside a
GC. If they make changes to one of their \lstinline{T}'s drop method unaware of
the consequences this has on GC, then a downstream crate may no longer compile
because FSA rejects this. This means that breakages are possible without any
changes to clearly delineated API boundaries such as function signatures. A
separate \lstinline{Finalize} trait would solve this, but at the cost of a lot
of boilerplate code.

\subsection{Which drop methods are safe finalisers?}
\label{finaliser_thread}


\subsubsection{Automating finaliser safety analysis}

Finaliser safety analysis is performed automatically without needing to do
anything manually. First, I introduce a new auto trait used as a marker for
finaliser safety called \lstinline{FinalizerSafe} (an introduction to auto
traits is provided in \autoref{sendsync}). As an auto trait, \lstinline{FinaliserSafe} is
implemented for all types by default in Rust, so in the Rust standard library, I
explicitly remove the implementation of \lstinline{FinalizerSafe} from types
which do not already implement \lstinline{Send} and \lstinline{Sync}.

The Rust compiler is then extended to perform FSA. The basic idea is that
whenever a type is used in a \lstinline{Gc}, that type's drop method needs to be
checked to ensure it doesn't access a field which is not
\lstinline{FinalizerSafe}. Performing this check only when such types are used
in \lstinline{Gc} is important as it prevents FSA from breaking existing Rust
programs: drop methods with unsound finalisation behaviour are not a problem if
they are never used in a \lstinline{Gc}.

\section{Finalisers and Rust references}


\subsection{Preventing dangling references with the \bor}

\label{sec:bor}

\ourgc checks that programs adhere to the \bor at compile-time, throwing an
error for those programs which violate it. Attempting to compile the earlier
example would result in the following error message:


\section{Rust destructors}

\label{destructors_detailed}

To understand the design decisions that \ourgc makes surrounding finalisation,
some background is needed on Rust's destructors.

Destructors in Rust were briefly introduced in \autoref{bg:basic_destructors} as
a way of running cleanup code when an initialized variable or temporary goes out
of scope. A destructor is run automatically at the end of the scope for values
which implement the \lstinline{Drop} trait. Consider the following example,
which uses a Rust destructor to close a file descriptor:

\begin{lstrustsmall}
struct FileDescriptor {
    fd: u64
}

impl Drop for FileDescriptor {
    fn drop(&mut self) {
        self.close();
    }
}

fn main() {
    let f = FileDescriptor { fd: 1 };
} // FileDescriptor::drop called.
\end{lstrustsmall}

The file descriptor \lstinline{f} is destructed (or \emph{dropped}) at the end
of the \lstinline{main} function, where it is no longer in scope. The ability to
drop objects is a key component of Rust's ownership semantics, and is used
extensively in the standard library.

A struct which has a drop implementation may have fields which also need
dropping. For example, consider a \lstinline{FileBuffer}, which has a field of
type \lstinline{FileDescriptor}:

\begin{lstrustsmall}
struct FileBuffer {
    descriptor: FileDescriptor,
}

impl Drop for FileBuffer {
    fn drop(&mut self) {
        self.flush();
    }
}
\end{lstrustsmall}

Here, both the \lstinline{FileBuffer} and its field \lstinline{FileDescriptor}
have drop methods which need running. Rust will automatically insert calls
to drop them both when a \lstinline{FileBuffer} value goes out of scope. In Rust
terminology, a value is considered dropped once its drop method, and all drop
methods belonging to its fields, have been dropped.

\subsubsection{Rust drop order}

\label{rust_drop_order}

Drop methods are used by Rust programmers for situations such as releasing
locks. In such cases, the order in which values are dropped is vital for
program correctness.

Variables and temporaries are dropped in reverse declaration order. For example:

\begin{lstrustsmall}
fn main() {
    let s1 = String::from("s1");
    let s2 = String::from("s2");
    let s3 = String::from("s3");
}
\end{lstrustsmall}

At the end of \lstinline{main}, \lstinline{s3} would be dropped first, followed
by \lstinline{s2}, and finally \lstinline{s1}.

Rust specifies that fields are dropped in declaration order. For example,
consider the following struct definition:

\begin{lstrustsmall}
impl Drop for S {
    fn drop(&mut self) {
        println!("Dropping S");
    }
}

struct S  {
    a: String,
    b: u64, // u64 does not implement the `Drop` trait.
    c: Vec<bool>,
}
\end{lstrustsmall}

\lstinline{S} contains two fields (\lstinline{a} and \lstinline{c})
which also need dropping. Rust will drop \lstinline{S} first, followed by
the field \lstinline{a}, and then the field \lstinline{c}.

If any component of a type implements \lstinline{Drop}, Rust will drop them when
they go out of scope. For example, consider an enum \lstinline{E} (a tagged
union), where one variant needs dropping:

\begin{lstrustsmall}
enum E  {
    A(String),
    B(bool),
}
\end{lstrustsmall}

Even though \lstinline{E} does not have a drop method, when it goes out of scope,
Rust will still insert a drop call because the variant \lstinline{E::A} contains
a droppable type, \lstinline{String}. It's not possible to know at runtime which
variant of the enum is active, so Rust inserts some additional code which checks
dynamically which variant (if any) to drop.


Since Rust ensures that drop methods are called automatically, it is not
possible to call the drop method for a value (or any of its fields) directly.
This ensures that a value is only dropped once, an important protection against
double-freeing resources. This can be restrictive, because sometimes it's useful
to drop a value earlier than at the end of its scope. Consider a common example,
where a \lstinline{Mutex}'s lock is released from its drop method:

\begin{lstrustsmall}
fn main() {
    let mutex = Mutex::new(123); // A mutex which guards a u64 value.
    let data = mutex.lock().unwrap();
    println("locked value: {}", data);

    // Code that shouldn't belong in the critical section
    ...
} // lock is released as part of drop.
\end{lstrustsmall}

By unlocking the mutex at the end of main, sometimes we can execute more code
than is necessary in the critical section. Rust provides a standard library
helper function, \lstinline{std::mem::drop} which can accept values of any type
in order to drop them early:

\begin{lstrustsmall}
fn main() {
    let mutex = Mutex::new(123); // A mutex which guards a u64 value.
    let data = mutex.lock().unwrap();
    println("locked value: {}", data);

    // Release the lock early
    std::mem::drop(data);

    // Code that shouldn't belong in the critical section
    ...
} // lock is released as part of drop.
\end{lstrustsmall}

\lstinline{std::main::drop} is implemented as an empty function. Since ownership
of \lstinline{data} is transferred (line 7), Rust will insert a call to
\lstinline{data}'s drop method immediately afterwards.

\subsubsection{Drop methods are not guaranteed to run}

Destructors in Rust are guaranteed to run at most once --- but they may not be
run at all. This is for three reasons.

First, consider the example back in \autoref{ch:rust} (\autoref{rc_cycle_drop}) where a cycle is created
between two \lstinline{Rc} values. A reference cycle such as this introduces a
memory leak and thus the values in this data structure are never dropped.

Second, values are only dropped if they are initialized. It is not always
possible to know whether a value is initialised so Rust can sometimes end up
performing dynamic checks to know whether a value should be dropped. The details
of this are not relevant to the rest of the thesis.

Third, one can explicitly prevent a value from being dropped by passing it to
the \lstinline{std::mem::forget<T>} function. This is commonly used when the
underlying resource originated from non-Rust code, and therefore destruction of
it should happen outside of Rust.

\subsubsection{What types can be dropped?}

\label{bg:drop_what_types}

In short, Rust will automatically call drop for any type which implements the
\lstinline{Drop} trait when it goes out of scope. However, copyable types
(those which implement the \lstinline{Copy} trait, mostly primitive
types such as \lstinline{bool}s, \lstinline{char}s, numeric types and so on)
cannot implement Drop because doing so would mean that
when values are copied they would be dropped multiple times. This
would violate Rust's guarantee that drop is called at most once.

Rust also supports C-like union types, which in contrast to enums do not use
runtime tags to denote the active variant. Union types are not automatically
dropped because there is no way for Rust to know which variant to insert a drop
method for.


\section{Design choices for finalisers in Rust}

Before explaining finalisation in \ourgc, we should ask an over-arching design
question: what should a finaliser in a Rust GC look like? Other approaches to GC
in Rust, such as \rustgc and \bronze define a custom \lstinline{Finalize} trait,
which types can implement to specify finaliser behaviour when they are used in a
\lstinline{Gc} (shown in \autoref{lst:finalise_trait}).

\begin{figure}[t!]
\begin{lstrustsmall}
struct S;

impl Drop for S {
    fn drop(&mut self) {
        println!("Dropping S");
    }
}

impl Finalize for S {
    // Run before collection when value used in a `Gc`.
    fn finalize(&mut self) {
        println!("Finalizing S");
    }
}

fn main() {
    let s1 = S;
    let s2 = S;

    let gc1 = Gc::new(s2);
} // Dropping s1
\end{lstrustsmall}
    \caption{An example from \rustgc, where a custom \lstinline{Finalize} trait
    is used for finalisation semantics. In this scenario, before \lstinline{s2}
    is collected, \rustgc calls \lstinline{S}'s \lstinline{finalize} method
    (line 11).}
    \label{lst:finalise_trait}
\end{figure}


The benefit of this approach is that it creates a logical separation between
destructors expected to run in an RAII based context, and GC finalisers. This
allows finalisers, which have subtly different rules to destructors, to be
correctly specified by the user (as we will see in
\autoref{ourgc:threadsafe_finalisation}, there are specific restrictions that need to be placed on
finalisers in Rust in order to guarantee soundness).

\ourgc takes a different approach, however, as separating destruction and
finalisation in this way has unfortunate consequences. First, for most types
that already implement Drop, their destruction logic must be duplicated in a
finaliser. This is, at least, significant extra effort; it also offers many
opportunities for copy and paste errors.

Second, a separate \lstinline{Finalize} trait has as a major ergonomic cost
because it's not possible to implement \lstinline{Finalize} on code from
external libraries. This is because Rust enforces \emph{trait coherence}, a
property in the language which ensures that every type has at most one
implementation of a given trait. This coherence rule is fundamental to the
language, because it removes ambiguity in trait method resolution, ensuring
there is only one implementation of a trait method to choose from.

Trait coherence is a problem for programs that use external compilation
units known as \emph{crates} (roughly speaking, `libraries'), because if two
unrelated crates provide separate implementations for the same trait, then those
crates cannot be imported together.
\begin{figure}[t!]
% Hack to get the listing in the figure
\newsavebox{\orphanalisting}
\begin{lrbox}{\orphanalisting}
\begin{lstrustsmall}
use a::{MyType, MyTrait};

impl MyTrait for MyType {
    fn method1() {
        ...
    }

    fn method2() {
        ...
    }
}

\end{lstrustsmall}
\end{lrbox}

\newsavebox{\orphanblisting}
\begin{lrbox}{\orphanblisting}
\begin{lstrustsmall}
error[E0117]: only traits defined in the current
              crate can be implemented for types
              defined outside of the crate
 --> src/lib.rs:3:1
3 | impl MyTrait for MyType {}
  | ^^^^^^^^^^^^^^^^^------
  | |                |
  | |                `MyType` is not defined in
  | |                the current crate
  | impl doesn't use only types from inside the
  | current crate

\end{lstrustsmall}
\end{lrbox}
    \centering
    \subfloat[\centering Invalid trait implementations]{\usebox{\orphanalisting}}
    \hfill
    \subfloat[\centering Compiler error]{\usebox{\orphanblisting}}
    \caption{
        Here, we try to provide an implementation of the externally defined
        trait, \lstinline{MyTrait} for the externally defined type,
        \lstinline{MyType}. This results in a compile error in Rust because it
        violates the orphan rule.}
    \label{lst:coherence}
\end{figure}

To address this, traits in Rust must adhere to something called the \emph{orphan
rule}. The rule is simple: it is not possible to implement a trait for a type
where both the trait and the type are defined in separate crates. This prevents
multiple conflicting trait implementations from existing across crates.
\autoref{lst:coherence} shows how the orphan rule is enforced at compile-time in
Rust.

The problem with the orphan rule is that it would become a major source of
ergonomic frustration for \ourgc if it defined a separate \lstinline{Finalize}
trait. It would not be possible to implement \lstinline{Finalize} for any type
which was not defined in the user's current crate. If types from external crates
do not provide their own implementations for \lstinline{Finalize}, then those
types may cause resource leaks when used in a \lstinline{Gc}.

A workaround for the orphan rule is to use the \emph{new-type idiom}, where the
current crate defines a wrapper type for an external type. Unfortunately, this
workaround can be cumbersome to write and makes types in Rust harder to read.
\autoref{lst:orphan_workaround} shows how the new-type idiom can be used to add
a finaliser to a type defined outside of the current crate. This can be used in
other GC designs for Rust which use a separate \lstinline{Finalize} trait such
as \rustgc.

\begin{figure}[t!]
\begin{lstrustsmall}
use a::MyType;

struct Wrapper(MyType);

impl Finalize for Wrapper {
    fn finalize(&mut self) {
        println!("Finalizing MyType via Wrapper");
    }
}

fn main() {
    let a = Gc::new(Wrapper(MyType::new()));
}
\end{lstrustsmall}
\caption{A workaround the orphan rule using the \emph{new type idiom}. Here, a
    new \lstinline{Wrapper} type is defined for which we define a finaliser. To
    garbage collected \lstinline{MyType} objects, one could then use
    \lstinline{Gc<Wrapper>} instead of \lstinline{Gc<MyType>} to ensure that its
    finaliser is called.}
    \label{lst:orphan_workaround}
\end{figure}

\begin{lstrustsmall}
let a = Box::new(String::from("Hello"));
let b = Gc::new(a);
\end{lstrustsmall}

The sole owning reference to the heap allocation \lstinline{Box<String>} is
moved into \lstinline{Gc::new}, which creates a \lstinline{Gc} object containing
the reference to the \lstinline{Box<String>}. This has the following
representation in memory:

\begin{center}
\includegraphics[width=0.75\textwidth]{images/alloy_finaliser_memory}
\end{center}

When the \lstinline{Gc}'s underlying allocation (called GcBox) becomes
unreachable, \ourgc will call its finaliser, which means that \lstinline{drop}
is called on all the component types (in the same way that Rust automatically
calls drop in \autoref{rust_drop_order}). If, for whatever reason, the finaliser
is not run, then the allocations for the \lstinline{Box} and the
\lstinline{String} will leak (i.e. their heap allocation will never be
reclaimed). I thus define a finaliser in \ourgc as calling drop on the contents
of a \lstinline{Gc} (including its field types). Therefore a type
\lstinline{Gc<T>} has a finaliser if type \lstinline{T} needs dropping.

\subsection{Omitting finalisers}

Finalisation is not always desirable. For example, consider a \lstinline{FileDescriptor}
which uses its drop method to close the descriptor:

\begin{lstrustsmall}
struct FileDescriptor {
    fd: u64
}

impl Drop for FileDescriptor {
    fn drop(&mut self) {
        self.close();
    }
}
\end{lstrustsmall}

Here, objects of type \lstinline{Gc<FileDescriptor>} would use a finaliser to
call the \lstinline{FileDescriptor}'s drop method. However, if we were to close
the descriptor in the mutator once we are finished with the object, the
finaliser is no longer necessary:

\begin{lstrustsmall}
let stdout = FileDescriptor { fd: 1 };
let descriptor = Gc::new(stdout);
...
descriptor.close()
\end{lstrustsmall}

To allow for this, \ourgc provides a special wrapper type,
\lstinline{NonFinalizable<T>}, which can be used to create a \lstinline{Gc}
which omits finalisers on an individual basis:

\begin{lstrustsmall}
let descriptor = Gc::new(NonFinalizable::new(stdout));
\end{lstrustsmall}

Here, when the \lstinline{Gc<NonFinalizable<FileDescriptor>>} is collected, it
will not be finalised. The \lstinline{NonFinalizable<T>} type has no additional
storage costs, and at runtime is represented as a bare
\lstinline{FileDescriptor}.

This is only intended to be used in exceptional circumstances where performance
is a concern: It can easily lead to resources leaks if not used carefully.

The most obvious solution to this is to ensure that only thread-safe types
can be used inside a garbage collected container. In other words, a type
\lstinline{T} could not be placed inside a \lstinline{Gc} unless \lstinline{T}
implements both \lstinline{Send} and \lstinline{Sync} -- Rust's builtin traits
for concurrency safety (see \autoref{sendsync}). This solution would
prevent programs from compiling if an object without synchronisation is placed
inside a \lstinline{Gc} container. While this would ensure that finalisers are
thread-safe, it is less than ideal for two reasons.

First, it would restrict a \lstinline{Gc} from managing many valid types: a
non-\lstinline{Send} and non-\lstinline{Sync} type would be prevented from being
used in a \lstinline{Gc} even if it doesn't have a drop method (and therefore,
never needed finalising in the first place!).

Second, for \lstinline{T} to be \lstinline{Send} and \lstinline{Sync}, all of
\lstinline{T}'s component types must be \lstinline{Send} and \lstinline{Sync}
too. This presents a dilemma: either every field of \lstinline{T} must be
thread-safe (even those which are never used in a finaliser); or, the user,
certain in the knowledge that \lstinline{T}'s drop method is thread-safe,
forcibly \lstinline{unsafe} implements \lstinline{Send} and \lstinline{Sync} on \lstinline{T}. In the
case of the latter approach, \lstinline{T} can then be accidentally be used in
concurrency contexts unrelated to garbage collection, bypassing an important
part of the type system in order to keep \ourgc happy.
\autoref{send_sync_dilemma} shows how this could cause a leaky abstraction which
introduces bugs in non-GC related code.


\section{Related work}
\label{sec:related_work}

Throughout Rust's history, there have been several attempts to introduce some
form of tracing garbage collection~\citep{felix15specifying, felix16roots,
manish16gc}. In fact, early versions of Rust explored using a form of this as a
first class feature of the language through the use of \emph{managed pointers}
(with the syntax \lstinline{@T}).\jake{cite: } This was removed early in Rust's
development before the first stable release \jake{cite removal commit
\lstinline{https://github.com/rust-lang/rust/commit/ade807c6dcf6dc4454732c5e914ca06ebb429773}}.

\citep{rustbacon} is a Rust library implementation of~\citep{bacon01concurrent}
which provides reference counting with tracing cycle collection. Like \ourgc it
introduces a user-visible type (\lstinline{Cc<T>}) for managing memory. The
underlying memory of a \lstinline{Cc<T>} object is reference counted, however,
when the reference count is decremented to a non-zero value, it is added to a
worklist so that it can later be checked for potential cycles (by a user
invoked \lstinline{collect_cycles} function). This performs a local trace .

Unlike \ourgc, destruction in \citep{rustbacon} is deterministic: it occurs as
soon as an object's last owner is gone; in the case of cyclic garbage, this
means after the user calls \lstinline{collect_cycles}. However, like \ourgc it
relies on the \lstinline{Drop} trait for destruction and does not implement
it's own \lstinline{Finalize} trait. \lstinline{Cc<T>} uses a \lstinline{T: 'static} bound, which prevents it from containing any regular rust references.
This prevents the trapdoor reference in finalisers problem explained in
\jake{section FSA}.

There are several advantages of \citep{rustbacon} over \ourgc: first, it is
purely library based, requiring no modifications to the Rust compiler to work;
second, for non-cyclic data, values will be dropped immediately like
\lstinline{Rc}; third, the deterministic nature of drop methods means that it
does not need to be run on a separate thread, eliminating a whole class of
synchronisation and concurrency errors present in traditional tracing-based
finalisation \jake{cite fsa section}.

It has several limitations when compared to \ourgc. First, it is single-thread
(though a concurrent cycle detection algorithm is theoretically
possible~\citep{bacon01concurrent}). Second, a \lstinline{Trace} trait,
detailing how to traverse an objects fields during cycle detection must be
manually implemented for any \lstinline{T} used inside a \lstinline{Cc}. This
means that it cannot be used with types from external crates without the
new-type workaround. Third, since drop methods are called on cyclic garbage,
drop methods which dereference fields of other \lstinline{Cc} objects will
result in a crash at runtime. \ourgc, on the other hand, will prevent such
programs from compiling with FSA.

In \autoref{sec:destructor challenges}, we introduced the most well-known tracing
GC option in Rust \citep{rustgc}. \jake{That section already discusses the
finalisation differences, so I won't repeat them here} The API for \rustgc is
similar to \ourgc, with the notable exception that \lstinline{Gc} in \rustgc
does not implement the \lstinline{Copy} trait. This means that in order to
obtain additional pointers to garbage-collected objects, the \lstinline{Gc}
must be cloned.

\laurie{what is the practical difference between \rustbacon and \rustgc?
from the description they sound identical from a user's perspective} \jake{the main difference is that a cycle collection must be explicitly invoked by the user for \rustbacon}
\rustgc is implemented as a hybrid form of reference counting and tracing GC.
There is no mechanism for scanning the stack for roots as in traditional GC, so
roots are tracked using reference counting, with a mark-sweep then performed
from these roots. Like \ourgc, \lstinline{Gc} references in \rustgc point to an
underlying \lstinline{GcBox}. However, in \rustgc, this \lstinline{GcBox}
maintains a count of all of its roots. \autoref{lst:rustgc_roots} shows how this
count is updated as references are used. During a collection, the
\lstinline{GcBox}'s on the heap are enumerated, and those with a non-zero root
count are used as roots to begin marking. Like \rustbacon, \rustgc traces
through objects by requiring types used in \lstinline{Gc} to implement a
\lstinline{Trace} trait, which has a \lstinline{trace} method called during
marking to traverse and mark objects during a collection.

\rustgc makes implementing \lstinline{Trace} easy by providing a macro
implemention, where types can be annotated with \lstinline{#[derive(Trace)]} and
have it implemented automatically. It uses its own type, \lstinline{GcCell} in
order to support interior mutability as a \lstinline{RefCell} cannot be used
with \rustgc. The \lstinline{GcCell} provides additional support for rooting and
unrooting objects across a borrow as they are mutated inside the \lstinline{Gc}.
It provides a similar API to the user as \lstinline{RefCell}.

Unlike \ourgc, objects are finalised by implementing a \lstinline{Finalize}
trait. Though this reduces much of the complexity that \ourgc needs in order to
support calling \lstinline{T::drop} from a finaliser or destructor context,
\rustgc requires the programmer to ensure that a finaliser implementation is
present for any type that may need to call \lstinline{Drop} on any of its
component types. It is not easy to know which of these component types may need
dropping, and forgetting to do so can cause memory leaks.

The \emph{Bronze} collector is an optional GC implementation for Rust which was
designed address usability concerns with Rust's borrow
semantics~\citep{coblenz21bronze}. It was
designed alongside an empirical study which measured how long it took students
to complete a variety of Rust tasks by using standalone Rust, and Rust with
Bronze for managing memory.

Bronze bases much of its implementation on \rustgc but with two key differences.
First, it tracks roots to GC objects by using a modified version of the Rust
compiler. Bronze's rustgc implementation inserts calls to LLVM's
\lstinline{gc.root} intrinsic at function entries in order to generate
stackmaps. When a GC call is requested, Bronze iterates over the stackmaps
generated its current call stack in order to locate the roots for garbage
collection. However, Bronze does not implement this for transitive references
from arbitrary objects. In other words, if a \lstinline{Gc<T>} exists as a field
inside another object instead of directly on the stack, it is not tracked
as a root for garbage collection.

The second major difference between \rustgc and Bronze is that Bronze's
\lstinline{Gc<T>} type allows the programmer to dereference its underlying type
\lstinline{T} mutably more than once. \citet{coblenz21bronze} describes this
as beneficial, because it makes it easier to use than other Rust shared
ownership. However, this is fundamentally unsound, and allows programs which
violate memory safety to be written in safe Rust using Bronze.
\autoref{lst:bronze_unsound} shows an example of how this can violate memory
safety by causing a write from deallocated memory.


\begin{figure}[!t]
\begin{lstrustsmall}
fn main() {
    let mut gr1 = GcRef::new(vec![1u16,2,3]);
    let mut gr2 = gr1.clone();

    let ref1 = gr1.as_mut();
    let ref2 = gr2.as_mut();

    // ref1 and ref2 now reference the same object:
    ref1.push(4);
    ref2.push(5);
    ref1.push(6);

    let ref1elem0 = ref1.get_mut(0).unwrap();
    // Force reallocation of the underlying vec
    ref2.resize(1024, 0);
    // Now this writes to deallocated memory
    *ref1elem0 = 42;
}
\end{lstrustsmall}
    \caption{An example of unsoundness in Bronze based on its ability to allow
    aliased mutable references. Here, we obtain two mutable references to the
    same underlying vector (lines 5-6), before using the second reference to
    resize the vector, which forces its backing store to be reallocated in
    memory (line 15). Later, when we try to access an element through the first
    reference, it no longer points to valid memory (line 15).
    } \label{lst:bronze_unsound}
\end{figure}

\shifgrethor~\citep{shifgrethor} is an experimental GC API for Rust which investigated a way for
potential GC implemenations to precisely identify and trace roots to GC'd
objects. \shifgrethor is therefore not a full GC library, but instead an
experimental design for how a GC could interface with the language.

The basic idea is that in order to create a \lstinline{Gc} object, it must be
created by, and exist alongside a corresponding \lstinline{Root<'root>} type on
the stack. The \lstinline{Root<'root>} can then dish out references to the
underlying \lstinline{Gc} which are tied to \lstinline{Root<'root>}'s lifetime.

\laurie{can \gcarena handle cycles? what about finalisers?}
\gcarena~\citep{gcarena} is another experimental approach at sound GC design in Rust. It was
originally developed as part of the \emph{luster} VM~\citep{luster}: an experimental Lua VM
written in Rust. Unlike \ourgc and the other approaches to GC in Rust seen so
far, \gcarena does not retrofit Rust with a GC. Instead, it provides limited
garbage collection in isolated garbage collected \emph{arenas}. Arenas carefully
guard mutator access to their objects through closures, which, when executing,
prevent the collector from running. This solves the difficult problem of finding
roots which reside on the stack: when an arena is \emph{closed} to the mutator,
no stack roots exist, so a collection can be safely scheduled. A single arena
may contain several garbage collected objects, but they cannot be
transferred between other arenas.

Because \gcarena is so different in nature to the other GCs described in this
chapter, it is difficult to compare it ergonomically to other approaches.

\section{New performance numbers}
\subsection{Performance}
\begin{figure}[t!]
    \centering
    \includegraphics[width=1\textwidth]{graphs/som_rs_perf.pdf}
    \caption{Results from the \somrs micro-benchmark experiment, where the SOM
    benchmark suite is run to compare two configurations of \somrs: \somrsrcbdwgc, where SOM objects are managed with RC but use the BDWGC's allocator with GC disabled; and \somrsgc, where SOM objects are managed using \ourgc's GC library, which uses the BDWGC's allocator. Each benchmark is run for 100 process
    executions, where the error bars represent 99\% confidence intervals.}
\label{graph:som_rs_finalisers}
\end{figure}

\subsection{Finaliser elision}

\begin{figure}[t!]
    \centering
    \includegraphics[width=1\textwidth]{graphs/som_rs_finalisers.pdf}
    \caption{Results of a performance of \somrs using two configurations: naive
    finalisation (where no optimisation is performed); and \ourgc's finaliser
    elision optimisation (where finalisers which are used only to deallocate
    memory are removed). Each benchmark is run for 30 process executions, where
    the error bars represent 99\% confidence intervals.}
\label{graph:som_rs_finalisers}
\end{figure}

\begin{figure}[t!]
    \centering
    \includegraphics[width=1\textwidth]{graphs/yksom_finalisers.pdf}
    \caption{Results of a performance of \yksom using two configurations: naive
    finalisation (where no optimisation is performed); and \ourgc's finaliser
    elision optimisation (where finalisers which are used only to deallocate
    memory are removed). Each benchmark is run for 30 process executions, where
    the error bars represent 99\% confidence intervals.}
\label{graph:yksom_finalisers}
\end{figure}

\subsection{Early finaliser prevention}

\begin{figure}[t!]
    \centering
    \includegraphics[width=1\textwidth]{graphs/som_rs_barriers.pdf}
     \caption{Results showing the performance of early finaliser prevention on
     \somrs, which uses three configurations: \textit{None} where there are
     no compiler barriers (which is unsound); \textit{All}, where every
     single \lstinline{Gc} reference has a corresponding barrier; and
     \textit{None}, where barriers can be optimised away where \ourgc is
     certain they are unnecessary. Each configuration is compared using a
     subset of benchmarks on the Rebench benchmarking suite for 30 process
     executions, where I report 99\% confidence intervals.}
\label{graph:yksom_finalisers}
\end{figure}

\subsection{Comparison between other GCs}
\label{othergcs}

My fifth and final experiment aims to understand the performance costs of
\ourgc against other garbage collected approaches in Rust.

The overall question I would like this experiment to answer is: how fast is
\ourgc when compared with the other garbage collected options available in Rust?
While I have been able to provide a more detailed assessment of \ourgc's
performance when used to implement other languages, I am only able to provide
a limited comparison of \ourgc's performance relative to other collectors:
converting benchmarks to Luster's unusual approach was prohibitively
difficult; and Bronze crashed with many benchmarks. In addition, I had
difficultly finding suitable Rust programs which were practical enough to modify
to use the various tracing GC implementations while also performing enough heap
allocations to be useful. Fortunately, and despite these restrictions, this
experiment is still able to provide valuable insights.

In this experiment, I compare the performance of \ourgc against three different
approaches to managing memory: using a indexed-arena,
\rustgc (\autoref{alloy_related}), and Rust's standard reference counting library. I
run each configuration on the Binary Trees benchmark from the Computer Language
Benchmark Game for 30 process executions. The wall-clock times are
recorded before and after each process execution using the multitime tool.

\makeatletter
\newcommand*\ExpandableInputBinTrees[1]{\@@input#1 }
\makeatother
 \begin{table*}[t]
     \begin{center}
     \begin{tabular}{llll}
 \toprule
\ExpandableInputBinTrees{./table_binary_trees.tex}
 \bottomrule
     \end{tabular}
 \vspace{6pt}
     \caption{Results from my experiment comparing \ourgc against three other
         garbage collection configurations on the Binary Trees benchmark for 30
         process executions, where I report 99\% confidence intervals. This
         clearly shows that except for the index-arena (which deallocates all
         its memory at once) \ourgc is the fastest configuration.}
         \label{table:binary_trees}
     \end{center}
\end{table*}

\subsection{How other languages deal with finalisers}

The D programming language uses a conservative mark-sweep GC for heap
allocations by default with support for opt-in explicit deallocation. Like \ourgc, D does
not specify an order for finalisers. Users can specify their own allocators for
RAII-based heap allocation using standard \lstinline{malloc}/\lstinline{free}
calls.

Oilpan, the conservative mark-sweep garbage collector in Chrome's rendering
engine, Blink, uses two levels of finalisation: full finalisation, which happen
off-thread after a GC cycle has completed but do not allow object fields to be
dereferenced; and pre-finalisers, which happen during the sweep phase, but allow
all of an objects fields to be dereferenced. Pre-finalisers are generally
avoided because of their performance overhead since they cannot be scheduled to
run on other threads and increase the stop-the-world time.

\jake{TODO: Nim}


\subsection{GC in Rust}

\subsection{Results}

\autoref{table:binary_trees} shows the results for this experiment. The results
shows that for Binary Trees (an allocation heavy benchmark)
\textsc{Typed-arena} was the fastest as it never performs any deallocation
during the benchmark run, it simply deallocates all memory at the end.

\rustgc performs poorly for two reasons. First, it uses a form of reference
counting to track the roots for each garbage collected object. Second, it has a
naive implementation of the mark-sweep algorithm and does not use parallel
collector threads.


\bibliographystyle{ACM-Reference-Format}
\bibliography{bib}

\end{document}