This repository contains the Docker setup and experiment configurations we use to reproduce the claims from FuzzJIT (paper). The first part of this README will show our results, and the second part will guide through the setup of our environment so that you can reproduce our results.
The FuzzJIT paper uses as targets (among others) v8, jsc, and spidermonkey. On all of these three engines, the authors report a significant increase in code coverage as well as improving the semantic correctness rate. The latter metric quantifies how many of the generated .js inputs execute successfully, as opposed to prematurely terminate due to an error in the .js code. We attempted to reproduce the two major claims of the paper.
We attempted to reproduce the improvements in code coverage. For this experiment, we run both FuzzJIT and Fuzzilli for 24h with 10 repetitions each. The branch coverage of the resulting corpora is evaluated with lcov.
In contrast to the authors, we cannot report an improvement in branch coverage.
The results from results/correctness.json
show a result quite different from what has been reported by the authors. We cannot reproduce any improvement in the semantic correctness rate.
Engine | Fuzzilli Reported | FuzzJIT Reported | Fuzzilli Measured | FuzzJIT Measured |
---|---|---|---|---|
JSC | 62.80% | 90.33% | 66.56% | 65.88% |
V8 | 64.34% | 97.04% | 66.74% | 63.67% |
SpiderMonkey | 64.13% | 93.28% | 67.47% | 63.93% |
To reproduce our evaluation, follow these steps:
-
Clone this repository
git clone [email protected]:fuzz-evaluator/fuzzjit-eval.git
-
Execute the
run.sh
scriptImportant Note: Running the script will first build the docker container consuming roughtly 50GB of storage and subsequenctly spawn 60 containers. Each of these containers will be bound to a specific CPU (0-59) and run for 24h. If your machine has less than 60 cores available, the script needs to be adapted.
The final results will be stored in `./results/