
22 swizzle: 0
23 }
24 }
25 }
3.4 Performance improvement
In last section, we discuss how to develop a structure-aware fuzzer against a third-party
library. In this section, we will introduce some experiences about improving the fuzzing
speed. Our customized fuzzer was first developed on a VMware Workstation virtual
machine, where only got around 30 runs per second. Later we realized that some of the
APIs are designed for resource rendering, which expects a genuine GPU device to help
with the rendering task. While inside a virtual machine, we can only use an emulated
GPU, that is the main reason for such disappointing speed. So we set up the fuzzer on
a desktop with i5-7500 CPU, Nvida GTX1080Ti GPU, and 32GB RAM. We got a much
more delighting performance on this machine, at around 350 runs per second per thread.
After that, we started to dig into the fuzzer and see if we can make it even faster.
We utilized Gperftools[9], which is a collection of performance analysis tools developed
by Google, to identify time-consuming operations. We modify the CMakeLists.txt of the
fuzzer project, making the binary target link with gperftools static library and compile a
new fuzzer. Then we can specify the name of the output file and start to collect profiling
data with the following command:
Listing 3.9: Run fuzzer with gperftools
1CPUPROFILE=./perf.out ./fuzzer -detect_leaks=0 -max_total_time=60 corpus
We can geneate reports in various file format as mentioned in gperftool’s document [5].
Generating a call graph in PDF format is recommended here:
Listing 3.10: Run fuzzer with gperftools
1pprof -pdf ./fuzzer perf.out > call_graph.pdf
Each block in the call graph represents a function, and the percentage number on the
last line of the block shows how many CPU time has costed by this function (including
the time spent on its subfunctions). For example, in Figure 3.2, we can easily spot
the entry function, LLVMFuzzerTestOneInput, which occupied 79.1% CPU time in this
case. Following the arrows, it calls TestOneProtoInput and then three functions. Among
these three functions, vrend_renderer_init has the highest CPU time cost percentage,
which is 61.7% of the complete run. This information implies, if we can simplify the
operations inside it or even eliminate the calls, it is more likely to obtain great performance
improvement.
In this version of our fuzzer (see Listing 3.5), we initialize a renderer at the beginning
of each run and clean it up at the end. These operations occupied up to 77.9% of CPU
time in a 60-second fuzzing test. While the target function fuzz_submit_cmd, what we
really interested in, only cost 0.9% of CPU time. Our aim is to reduce the time on the
setup and tear-down, as well as increase running time proportion on target function.
9