-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seq write #218
base: next
Are you sure you want to change the base?
Seq write #218
Conversation
…RPCs Current scenario: Xprt socket fd is being monitored via epoll mechanism for the arrival of data on the socket. Once the data is arrived, an epoll event is generated and the socket fd is removed from the epoll fd. A new thread is chosen to monitor the epoll fd while the current thread recv 1 RPC worth of data. In the current thread, once the data is received, the socket fd is added back to the epoll fd so that other thread can start recving data while the current thread than continues executing the RPC. Changed scenario: We dont add the socket fd back to epoll fd unconditionally once the data is read from the socket. We look at the RPC size, if it's more than 4KB (which can only happen in the case of write RPC) and if there is data still available on the socket fd to be read, the current thread will keep doing recv while dispatching the requests asynchronously to svc_request_async(). At max we will recv max_cont_recv requests continuously (defaults to 64) in a single thread. With this patch we are seeing consistent seq write perf improvement in the range of 15 - 20%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises issues that need to be discussed. The current scheme (one RPC per epoll iteration) is our only method of ensuring fairness between clients. If we continue to read from a single socket, this breaks this fairness guarantee. Can we be sure this will be "fair enough"? It seems a "few" write heavy clients could starve other clients out.
That is correct. That is why I have added this as conf param. Old behaviour can be achieved by setting cont_recv_limit as 0. Also, I am looking for suggestion(s) to keep this as fair as possible while still retaining the performance improvement. |
@dang @ashishsangwan2 nfs-ganesha actually preferred to continue reading on active sockets, but as @dang noted, fairness was a problem in older ganesha. I think the configurable approach here is fine, but possibly the default coefficient of 64 is a bit large? |
One thing we can do is return the number of executing + pending request to the caller of work_pool_submit() i.e. svc_vc_recv(). Based on that, we can decide should we continue recv OR should we yield. |
Thanks @mattbenjamin for the reply. Would you please suggest in your opinion what could be a fair default value? Also, I would like to know your thoughts about returning pending requests as part of work_pool_submit() and if we could make it a factor for doing continuous recv(). For example if (pending_request < x*ioq_thread_max), continue doing recv. |
default value: maybe 4 or 8? It sounds helpful to consider pending_requests, but I'm not sure about the tuning. Maybe a fractional value would be safest. |
Here's a quick summary of the performance numbers I got on my setup with different values of cont_recv_limit |
Also of interest though, is affect on concurrent clients. |
@mattbenjamin |
Could you have some clients doing writes, and others doing something else, such as listing large directories? That's a nicely gated round-trip workload that could be easily starved. |
@dang Ok, I am going to measure the directory listing time (in next week or so) for a directory with 100K entries while 0, 2, 4, 6 clients doing parallel sequential writes, with and without the patch. |
I think we should abandon this one. Submitter has not gotten back to us. |
Let's leave it for the moment, for future reference when we revisit queuing and whatnot. |
@dang @ffilz @malahal
Can you please review the changesets?
This is inspired from the linux kernel client changeset:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/sunrpc/xprtsock.c?id=8d6f97d698b2b9e71124ba95688a6bf1adad5055
Also, for the current changeset, I haven't yet submitted the Ganesha change for setting the TIRPC conf.
Though the default value for cont_recv_limit is 64, it gets reset to 0 by tirpc_control() once Ganesha starts.