RCL is tested extensively. The two primary means of testing are golden tests and fuzzing.
Golden tests are located in the golden
directory as files ending in .test
.
Every file consists of two sections: an input, then a line
# output:
and then an expected output. The test runner golden/run.py
takes these files,
executes RCL, and compares the actual output against the expected
output. The mode in which run.py
executes RCL depends on the
subdirectory that the tests are in. See the docstring in run.py
for more
information.
The goal of the golden tests is to cover all relevant branches of the code. For example, every error message that RCL can generate should be accompanied by a test that triggers it. It is instructive to inspect the coverage report to see what tests are missing, or to discover pieces of dead code. The easiest way to generate the report is with Nix:
nix build .#coverage --out-link result
xdg-open result/index.html
Alternatively, you can build with coverage support and run grcov
manually,
see flake.nix
for the exact flags and commands.
The other means of testing are fuzz tests. RCL contains two primary fuzzers
called fuzz_source
and fuzz_smith
. The input to the source-based fuzzer
are valid RCL files that contain a little bit of metadata about
what mode to exercise. Putting everything in one fuzzer enables sharing the
corpus across the different modes. The input to the smith-based fuzzer are
small bytecode programs for a smith that synthesizes RCL programs
from them. The smith fuzzer is a lot faster than the source-based fuzzer,
because it does not have to overcome trivial obstacles like having balanced
parentheses or using the same name in a variable definition and usage site.
Getting past such obstacles is hard for the source-based fuzzer because it
requires two separate mutations to work together to make a valid file. The
downside of the smith-based fuzzer is that it does not explore the full input
space, in particular regarding parse errors, so that’s why a combination of both
fuzzers is ideal.
The basic modes of the fuzzer just test that the lexer, parser, and evaluator do not crash. They are useful for finding mistakes such as slicing a string across a code point binary, or trying to consume tokens past the end of the document. However the more interesting fuzzers are the formatter, and the value pretty-printer. They verify the following properties:
- The formatter is idempotent. Running
rcl fmt
on an already-formatted file should not change it. This sounds obvious, but the fuzzer caught a few interesting cases related to trailing whitespace and multiline strings. - The evaluator is idempotent. RCL can evaluate to json, and is itself a superset of json, so evaluating the same input again should not change it.
- The formatter agrees with the pretty-printer. When RCL evaluates to json, it pretty-prints the json document using the value pretty printer. The formatter that formats source files on the other hand, pretty-prints the concrete syntax tree. This check tests that formatting a pretty-printed json value is a no-op.
- The json output can be parsed by Serde. Even if RCL can read back a json output, that is no guarantee that a third-party parser considers the output valid, so we also test against the widely used Serde deserializer.
- The toml output can be parsed by Serde. Same as above, but for toml output.
To run the fuzzers you need cargo-fuzz
and a nightly toolchain.
Then start e.g. the smith fuzzer or source-based fuzzer:
cargo +nightly-2023-06-03 fuzz run fuzz_smith -- -timeout=3
cargo +nightly-2023-06-03 fuzz run fuzz_source -- -dict=fuzz/dictionary.txt -timeout=3
Unlike the other fuzzers, the inputs to the smith fuzzer are not human-readable.
To see what kind of inputs the fuzzer is exploring, you can use the smithctl
program to interpret the smith programs:
cargo build --manifest-path=fuzz/Cargo.toml --bin smithctl
To print all inputs discovered in the past 5 minutes:
fd . fuzz/corpus/fuzz_smith --changed-within 5m --exec target/debug/smithctl print
There is less of a focus on unit tests. Constructing all the right environment and values for even a very basic test quickly becomes unwieldy; even the AST of a few lines of code, when constructed directly in Rust, becomes hundreds of lines of code. Rather than constructing the AST by hand, we can just use the parser to construct one. Similarly, for expected output values, instead of constructing these in Rust, we can format them, and express the entire process as a golden test instead.