-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support channel.close in the middle of a publish #11210
Conversation
Tests are coming. Will maybe generalise to all method frames as per the spec. Just channel.close was the minimal change to solve our case. |
The incomplete method is thrown away, and the following non-content method (eg channel.close) is processed normally.
This is a highly hypothetical scenario that hasn't come up since 2007 (that I recall). I'd like to see evidence that it does not introduce any performance regressions. |
The only scenario where a client can run into something like this is:
and sharing channels across threads like that is explicitly not supported in any of the clients. So it must be only proxies where N client connections are multiplexed to 1 actual upstream connection. |
We will execute performance tests to ensure there is no performance impact. I am certain the code can be made like that, to only modify the code path, when it would hit the frame error anyway, and not the happy path. In a followup commit I modified the code to not special case the channel.close method but act similarly for any non-content method according to the spec. However that might lead to behaviour which some would consider unexpected or confusing, eg prematurely terminating a publish with a queue.declare, the publish might seem to be lost while the channel is still up. Therefore I can accept if the behaviour is restricted to channel.close when the affected channel won't stay open. I wasn't sure where to place the tests so I just put them in a new suite. For the record the suite fails without this patch the following way:
|
@gomoripeti I expect that this version, with Like I said earlier, I don't see how a realistic client can inject a If the scope of this continues to creep, we will close this and ask you to solve this in your proxy, which is the only affected project if you think of it. |
As far as channel parser states go, this can be implemented as
However, the reasoning in section 4.2.6 as quoted above carries no practical sense. In my 14 years as a RabbitMQ contributor I have never seen a user try to "terminate content" like that. Sorry but it's just too hard to justify a change like that that will potentially affect every single user. If an AMQP 0-9-1 proxy developer chooses to adhere by section 4.2.6, they can. Otherwise this is a clear example of overspecialization for one specific user, and we have previously rejected such PRs, even if they would potentially benefit more than the users of a single proxy (e.g. #10293). |
Sorry, the more I think about how this can be done, the more I find section 4.2.6 to be ridiculous and this change as inevitably risky. We have deviated from the AMQP 0-9-1 spec in the past for practical implementation reasons and questionable or ambiguous decisions in that fronze-in-time spec, and I have no problem with that. |
What a proxy could do to avoid an exception is something like this:
This will not allow clients to "terminate a publish" early but it will be a safe thing to do. To handle potentially missing body frames, introduce a "flushing timeout" where all pending frames are sent. This would not be a safe thing to do, though, so such timeout would have to match the heartbeat or TCP keepalive timeout used. It's not the most straighforward algorithm to implement but I honestly don't see why cloudamqp/amqproxy#162 should be solved in RabbitMQ if the user very clearly states that directly connected clients are not affected. |
Sorry, the PR was made more generic than intended. We only want to support It's not practical in a proxy or pool to buffer full messages, they can be tens of MB large (this is more common that you would think), exploding memory usage when you have potentially hundreds of thousands of downstream clients. You're wrong to assume that cloudamqp/amqproxy#162 is related, this is a different and a very generic issue for anyone implementing a channel pool. The branch has been updated with 137315e but it doesn't show up in the PR as it's already closed. |
The generalisation was just an early experiment from my side, and by the time pushing the tests I realised it is a bad idea to interleave frames of multiple methods on the same channel. We don't have any use case for that. Let's forget about that and forget about the spec. But as Carl wrote it's useful to keep connections long lived, avoid reconnect (which is expensive on TCP, TLS and AMQP levels) and creating connection churn. As an alternative (only theoretical) the broker could support a channel.close(ChannelId) method on channel 0, so that a client side "channel manager" could terminate individual channels without keeping much state about them. But I think the proposed change in this branch has much smaller diff and less impact. |
Proposed Changes
RabbitMQ isn't respecting section 4.2.6 in the spec: https://www.rabbitmq.com/resources/specs/amqp0-9-1.pdf
Instead any non Body frame after a Header frame closes the whole connection.
This PR allows channel.close frames to cancel an ongoing publish.
Types of Changes
Checklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments
In AMQProxy we pool channels from many different downstream client on a upstream single connection, but if any of those downstream client disconnect while not have finished a full publish the proxy tries to close the upstream channel but then RabbitMQ closes the whole connection and thus all other channels from all other downstream clients.