Skip to content
This repository has been archived by the owner on Nov 23, 2017. It is now read-only.

Sometimes it's important to check for socket writeability before trying to write #446

Open
njsmith opened this issue Oct 16, 2016 · 7 comments

Comments

@njsmith
Copy link

njsmith commented Oct 16, 2016

I recently discovered that Linux/OS X provide an important API (TCP_NOTSENT_LOWAT) that lets applications avoid queuing up excessive data inside the kernel's socket send buffers. (The socket send buffers are generally too big, for various reasons.) Unfortunately, it turns out that this API works by controlling when a socket is marked writeable by select and friends, but does not affect whether a send call will succeed, so while you might think these are the same thing they actually aren't. [Edit: it turns out that this description is actually incorrect on Linux, though probably true on macOS -- see] I initially filed a bug on curio about this because curio was assuming they were the same, so I won't repeat all the details: dabeaz/curio#83

@dabeaz points out that asyncio seems to make the same invalid optimization, so filing a bug here too.

@njsmith
Copy link
Author

njsmith commented Oct 17, 2016

On further discussion (see the curio issue), it sounds like the tentative conclusion is:

@gvanrossum
Copy link
Member

gvanrossum commented Oct 17, 2016 via email

@gjcarneiro
Copy link

I guess this is trying to address the buffer bloat problem?...

@njsmith
Copy link
Author

njsmith commented Oct 18, 2016

@gjcarneiro: bufferbloat is a many-headed hydra, but yeah, this is about bufferbloat in the context of per-socket send buffers specifically. The discussion thread on the curio issue has lots more details.

@glyph
Copy link

glyph commented Oct 26, 2016

@glyph has this reached Twisted yet?

Not TCP_NOTSENT_LOWAT, no. I'm sort of curious how our producer/consumer API interacts with this detail; I have a feeling it'll behave correctly, but I'm not entirely sure.

However, in the process of investigating this, I learned that we apparently removed the eager-write optimization many years ago:

twisted/twisted@c75d1eb

Digging into the history and viewing some of the discussion around that time, it seems that we were aware that it punished us pretty brutally on certain micro-benchmarks, but there's no realistic benchmark we could find where it impacts performance significantly. @dabeaz points out over on the other ticket that it's a massive performance penalty to an echo-server benchmark, and that's true; however, echo is not a realistic application.

If you want to do anything interesting you need to talk to at least one other back-end service, which means that you need to carefully manage the relationship between two transports, which means you need a producer/consumer hookup. Once you have that, you can't really get the meat of the optimization that eager-writes give you, which is the ability to avoid the extra select/epoll/kqueue(etc) syscall between recv and send, since you need to go back to the main loop to see if it's time to read again between each packet anyway.

It also does punish the writer on benchmarks where you are synthesizing data on the CPU rather than getting it or processing it from a different remote source, but /dev/urandom as a service also has pretty limited utility.

That said, I don't think Twisted is a great model to look towards for good support for tunables; tuning has historically been a weak point for us, because users who have significant performance demands almost always end up fixing them by making scaling up and down easier rather than optimizing throughput. Also, the only application where this sort of tuning tends to make any difference is something that is just shuttling around huge volumes of data without really processing it, and if you're doing that you're more likely to use HAProxy or something.

That said, I really appreciate learning about this nuance of send on linux. Hopefully at some point in the coming year we're going to do an overhaul of how we deal with tunable transport parameters (mostly focused on the more-portable SO_SENDBUF and SO_RECVBUF than this platform-specific detail) and it'll be good to keep it in mind for that.

@Lukasa
Copy link

Lukasa commented Nov 15, 2016

I should note that I have an interest in adding support for TCP_NOTSENT_LOWAT into Twisted because it's highly-valuable for HTTP/2, where it's extremely valuable to keep send buffers small if possible to prevent control frames getting blocked behind buffered stream data. That means that support for APIs of that kind is likely to want to be something asyncio provides as well.

However, I disagree with @njsmith's assertion that asyncio just wants to start using it by default. In particular, for bulk unframed data transfers where throughput is more important than reactivity, applications will want to avoid spinning up the Python event loop wherever possible: for that reason, large writes are ideal and using TCP_NOTSENT_LOWAT with a bad value will have nasty negative performance impacts. The biggest case of this is for protocols like FTP and HTTP/1.1, particularly when sendfile is not available to the application, where we want to free the event loop up to do other things rather than repeatedly send smallish writes into the kernel.

In the worst-case of a 100% CPU-utilisation event loop, aggressively low values of TCP_NOTSENT_LOWAT can lead to pauses in data transfer because the event loop isn't able to respond to the POLLOUT event before the kernel send buffer empties entirely.

It is much better for asyncio to expose this kind of tuneable rather than opt-into it by default. Let application developers decide what the performance characteristics of their protocols should be.

@njsmith
Copy link
Author

njsmith commented Nov 15, 2016

Ah, but that can be handled by the library too. On OS X, the splitting of
large writes isn't an issue at all, since TCP_NOTSENT_LOWAT only affects
select-and-friends, not send-and-friends. And in Linux, you can achieve the
same effect by having your send routine do: (1) turn off TCP_NOTSENT_LOWAT,
(2) call send, (3) turn it on again. The basic intuition here is that you
want to let the send buffer drain before signaling writeability to avoid
standing buffers, but once the application has committed to sending a large
chunk of data, you want to hand that off to the kernel as quickly as
possible, even if that does temporarily create a large buffer.
.
I agree that the actual TCP_NOTSENT_LOWAT value should be tuneable, and
that this is a somewhat experimental proposal. But theoretically at least
it seems like there are some pretty compelling arguments that the best
default value for TCP_NOTSENT_LOWAT is smaller than the "infinity" we
currently default to.

On Nov 15, 2016 04:44, "Cory Benfield" [email protected] wrote:

I should note that I have an interest in adding support for
TCP_NOTSENT_LOWAT into Twisted because it's highly-valuable for HTTP/2,
where it's extremely valuable to keep send buffers small if possible to
prevent control frames getting blocked behind buffered stream data. That
means that support for APIs of that kind is likely to want to be something
asyncio provides as well.

However, I disagree with @njsmith https://github.com/njsmith's
assertion that asyncio just wants to start using it by default. In
particular, for bulk unframed data transfers where throughput is more
important than reactivity, applications will want to avoid spinning up the
Python event loop wherever possible: for that reason, large writes are
ideal and using TCP_NOTSENT_LOWAT with a bad value will have nasty negative
performance impacts. The biggest case of this is for protocols like FTP and
HTTP/1.1, particularly when sendfile is not available to the application,
where we want to free the event loop up to do other things rather than
repeatedly send smallish writes into the kernel.

In the worst-case of a 100% CPU-utilisation event loop, aggressively low
values of TCP_NOTSENT_LOWAT can lead to pauses in data transfer because the
event loop isn't able to respond to the POLLOUT event before the kernel
send buffer empties entirely.

It is much better for asyncio to expose this kind of tuneable rather than
opt-into it by default. Let application developers decide what the
performance characteristics of their protocols should be.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#446 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAlOaH3KMPWd80cI2CB8X2cdsddhk99Oks5q-akDgaJpZM4KX5Ye
.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants