Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensuring NIOAsyncChannel flushes all writes before closing #3049

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

adam-fowler
Copy link
Contributor

Currently NIOAsyncChannel can close the channel without having flushed all the writes it has made.

Motivation:

I believe there are two situations where writes are not getting written. At the point the NIOAsyncChannel starts the channel close procedure it calls finish on the writer which will resume all the pending continuations but then closes the channel immediately so those writes are never written. The second situation is related to the fact that there is no way to wait on a write actually making it all the way through the channel pipeline.

These proposed changes are more to start a discussion and this PR is still very much a draft.

Modifications:

Instead of passing Values through the NIOAsyncWriter we pass actions. There are currently three actions

  • write: write value to channel
  • writeAndFlush: write value to channel and flush the pipeline (returning when write promise either fails or succeeds).
  • flush: Ensure all previous writes have been written.

Result:

As long as the last write a user makes is with writeAndFlush everything will get written out before the channel is closed.

// This is always called from an async context, so we must loop-hop.
// Because we always loop-hop, we're always at the top of a stack frame. As this
// is the only source of writes for us, and as this channel handler doesn't implement
// func write(), we cannot possibly re-entrantly write. That means we can skip many of the
// awkward re-entrancy protections NIO usually requires, and can safely just do an iterative
// write.
self.eventLoop.preconditionInEventLoop()
guard let context = self.context else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has been moved into _doOutboundWrites as we need to complete promises even if the channel handler is no longer there.

// This is always called from an async context, so we must loop-hop.
// Because we always loop-hop, we're always at the top of a stack frame. As this
// is the only source of writes for us, and as this channel handler doesn't implement
// func write(), we cannot possibly re-entrantly write. That means we can skip many of the
// awkward re-entrancy protections NIO usually requires, and can safely just do an iterative
// write.
self.eventLoop.preconditionInEventLoop()
guard let context = self.context else {
// Already removed from the channel by now, we can stop.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has been moved into _doOutboundWrites as we need to complete promises even if the channel handler is no longer there.

@FranzBusch
Copy link
Member

Thanks for tackling this. I and the rest of the team agree that this is a problem but we are not yet sure how we want to fix it. There are three closely related problems that have to be tackled together to from a good overall async interop story.

  1. Our bootstraps don't provide a with-style scoped access to the channel.
  2. Our NIOAsyncChannel.outbound are dropping writes when executeThenClose is finished
  3. Any child channel async sequence like the one in HTTP2 is potentially hitting our fatalErrors when the NIOAsyncChannel is buffered and dropped

We want to tackle all of them together to provide an API that achieves:

  1. Great interop from NIO to async
  2. Maximum performance
  3. Clear semantics of when writes are fully written and closing
  4. Clear lifetime of channels

In this PR you are tackling the problem of the writes which I understand you are hitting in Hummingbird right now; however, the other open problems are unhandled. Just to be clear I don't expect you to tackle those. For the writes we have discussed four different approaches in the team so far:

  1. Introduce an explicit flush method on the NIOAsyncChannelOutboundWriter (this is basically your PR here)
  • The problem with this approach is that it extends the public API and users have to handle manual flushing again. Our goal was and is that this is happening implicitly and users can just write and we will make sure they are getting out appropriately
  1. Create a promise for every write and wait until the write has been written to the socket
  • This would work but it heavily penalises the performance by allocating a promise/future for every write. Even if we could avoid that allocation it basically requires a single write to be written to the socket before the next one can be enqueued which will limit the throughput. We can offer batch write APIs but we would also need to offer batch read APIs otherwise batch writing becomes almost impossible.
  1. Introduce a graceful close method on Channel and ChannelHandler
  • This could work but it we would need to implement it in all channels. Furthermore, the argument can be made that close(mode: .output) is basically that.
  1. Rely on the existing close(mode: .output)
  • This might work but requires additional work in the channels since users might still be interested in reading.

@adam-fowler
Copy link
Contributor Author

  1. Introduce an explicit flush method on the NIOAsyncChannelOutboundWriter (this is basically your PR here)
  • The problem with this approach is that it extends the public API and users have to handle manual flushing again. Our goal was and is that this is happening implicitly and users can just write and we will make sure they are getting out appropriately

The problem with not extending the public PR is you are trying to match a single write API to an api that includes a number of combinations (write, write and wait on promise, writeAndFlush, writeAndFlush and wait on Promise). You have to choose to map to one of these. Currently you are mapping to writeAndFlush.

The channel pipeline write APIs provide all this functional flexibility. Is there a reason NIOAsyncChannel is wanting to hide it all?

  1. Create a promise for every write and wait until the write has been written to the socket

Please do not do this. The performance implications are dreadful.

I did think about doing something a little more internal where any write that could be considered a final write would have a promise created for it internally, but the write function wouldn't wait on this promise completing but it would be stored "somewhere" and when we get to the executeAndClose {} closing phase it would wait on the last promise allocated for a write.

As I understand it the only way to verify a write has made it all the way through the pipeline is to create a promise for it. I think you'll need a higher level API to indicate the possibility of a write being the final write to avoid creating promises for ever write.

@Lukasa
Copy link
Contributor

Lukasa commented Jan 13, 2025

The channel pipeline write APIs provide all this functional flexibility. Is there a reason NIOAsyncChannel is wanting to hide it all?

The main reason is that one of the most common NIO bugs in real code is forgetting to flush.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants