-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Made code (mostly) runtime and transport independent #77
Made code (mostly) runtime and transport independent #77
Conversation
Overall this looks very promising. However I have some performance related concerns due to change of codec type. It would be good if we could implement several simple performance tests (#44) and try to measure socket throughput before and after this change. I think we can implement some kind of tests based on https://github.com/bheisler/criterion.rs And it would help us to track how our changes affect overall library performance. @TheButlah could you please take a look at this tests? I planned to start working on them but for now I don't have a lot of free time unfortunately :( (I hope that I'll have more time available in November) |
I absolutely agree about the performance tests, especially if we want to be on par with the performance of the C++ version. However, do we really need them now as opposed to when the codebase is more mature (or at least after I've implemented some of the other features that this enables :P)? The PR introduces two possible spots for minor slowdown - the dynamic dispatch that will happen for the boxed The dynamic dispatch can be solved by making the inner type a generic argument and bubbling up the generic args through the functions that use it. This would make the type signatures more complicated but is definitely possible. Another probably better alternative is to use "enum dispatch", which can be done whenever the number of potential implementations of the trait is known in advance. In this case we would have a fixed set of implementations, one for each stream type (async-std, tokio)*(tcp, wss, ipc,..) in an enum, and instead of boxing the trait object and doing dynamic dispatch, we have an enum of the correct type. This can be done easily with the https://docs.rs/enum_dispatch/0.3.3/enum_dispatch/ crate, and is supposed to be a lot faster than dynamic dispatch. With both of these approaches in our toolbox, I'm very confident that if we are concerned about the performance cost of dynamic dispatch, we could submit a later PR to bring the performance inline with the static dispatch approach. As for the TLDR I agree about the need for perf benchmarks, but I'm less enthusiastic about implementing them before working on some of the other stuff, mostly due to lack of experience writing perf benchmarks. Since the only theoretical performance regression is boxed dynamic dispatch, and since we can fix that in the future via either bubbling the generics up or using enum dispatch, could we tackle that issue later? |
In fact if you'd like, I could implement the enum dispatch approach now- that library I linked really does make it quite simple. Would that be satisfactory? |
The main issue I see is not just dynamic dispatch but rather 2 different implementations of Codec: If you don't want to work on tests I can surely do it (cause I have it already in plans). In such case can we keep this MR in a separate branch without merging for couple of weeks? I'll try to add some benchmarks and compare these 2 branches as soon as I will have some time to do actual coding.. |
Ah I see, I didn't realize the concern was tokio_codec vs futures_codec. I'm reading a bit more right now on criterion, and if the concern is just testing the difference between those two codec implementations, thats probably something easier to test. Lemme see what I can throw together in a separare MR that only switched from |
Quick update: I've added #78 as a more limited version of this PR that only switched to futures_codec, without getting any of the runtime or transport agnostic stuff. I'll work on a basic performance benchmark for the codecs, submit a PR for that, and once thats merged we can rebase both #78 and the current MR to see what the perf implications are. However, in order to benchmark our stuff with criterion, we can only benchmark public functionality and the codec stuff is private to the crate. @Alexei-Kornienko Would it be a good idea to break the codec stuff out into a separate crate and use cargo workspaces? After all, someone could theoretically want to use your codec implementation in other rust zmq crates |
@TheButlah I don't think that we need to benchmark codec directly. Instead I think we can write a generic benchmark. For example it could be very similar to our integration tests:
Run this test on current master branch. Run the same test for your branch. If results are +- the same then there is no visual degradation in performance. |
Related: #81 |
OK - I've rebased on the recently merged PRs, and the only thing left to do is to undergo review (make sure I didn't mess up anything in the rebase) as well as run our new benchmark to see if the dynamic dispatch causes a measurable performance regression. |
❯ cargo bench -- --load-baseline runtime-agnostic --baseline master Finished bench [optimized] target(s) in 0.05s Running target/release/deps/req_rep-f8f5c926edd0add0 Gnuplot not found, using plotters backend Bound rep socket to tcp://localhost:33241 1 req and 1 rep messaging time: [14.679 ms 14.712 ms 14.746 ms] change: [-2.6787% -2.3447% -2.0213%] (p = 0.00 < 0.05) Performance has improved. Found 24 outliers among 512 measurements (4.69%) 15 (2.93%) high mild 9 (1.76%) high severe Looks like this MR actually speeds things up - strange, given the dynamic dispatch. But I'll take it. @Alexei-Kornienko are we good to merge? |
@TheButlah I rechecked again locally. I don't see any changes in performance talking 3 runs on current master and 3 runs on your branch. Overall LGTM |
In order to get the code to work in a way that is agnostic to a particular transport, as well as pave the way towards supporting both tokio and async-std, we need to refactor the code to stop using
TcpStreams
andtokio_util::codec::Framed<TcpStream, ZmqCodec>
types where possible. The main contribution of this PR is as follows:FramedIo
type, which is analagous totokio_util::codec::Framed
, except it is async-runtime independent, and also independent across transport types.tokio::spawn()
calls and to convert onetokio::select!
intofutures::select!
, both of which are trivial and can be done in a separate PRutil::start_accepting_connections()
totransport::tcp::begin_accept()
. It now provides a more general callback that doesn't rely on any particular transport, runtime, or socket-specific data (for example thePubSocket.backend
can be captured in the closure as opposed to passed along through thebegin_accept()
function).