Skip to content

Append Path performance comparison between Pravega Rust and Java client

Wenqi edited this page Dec 19, 2020 · 6 revisions

About the experiemnt

Goal

The append path performance is an essential part of the overall Pravega performance and we want to make sure the client is not the bottleneck. One of the reasons of porting Java client to Rust is that Rust supposed to have much better performance than Java due to its language property(directly compiled to machine code). This experiment is an initial proof of our presumption.

The design of the experiment

When writing to Pravega, the data flows roughly like this:

User code ---> Pravega Client ---> Kernel ---> Network ---> Pravega Segmentstore ---> Bookkeeper

In this experiment we measure the latency:

Pravega Client ---> Kernel ---> Mock Pravega Segmentstore

How to run the benchmark

Navigate to the root directory and run

cargo bench

For Java client benchmark, check the self test tool.

How to run Flamegraph to diagnose performance bottleneck

Install Flamegraph.

Notice that you may need to set kernel.perf_event_paranoid to 0 to enable CPU event data for perf, one possible way is sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"

In theory cargo flamegraph --bench benchname could work. If it doesn't you can run flamegraph /path/to/executable to get the result. Result is flamegraph.svg by default stored in the project directory.

Result

CPU: Intel Core i7-8700K 3.70GHZ

100 Byte Event Rust client Java client
1K writes 1.98ms 164ms
10K writes 20ms 408ms
100K writes 204ms 2391ms
1M writes 1913ms 15123ms

Rust Client has a linear latency increasing as the load increases and Java has a constant overhead for small workload but performs well when the workload increases. It is probably because the Java client is pretty optimized and Rust client doesn't have that yet. However, Rust client still beats Java client even it's not optimized and it may be better after a performance tuning.

Optimization(serde_bytes): Use serde_bytes crate for serialize/deserialize Vec<u8> and &u[8]. This crate enables more efficient way to serialize and deserialize for this particular slice and vector.