-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(request-response): Add support for custom dial options when making request to disconnected peer #5692
base: master
Are you sure you want to change the base?
Conversation
…ng request to disconnected peer
d8d075f
to
05fe824
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't in the community call where this was discussed, so apologies if this has already been talked through.
But why isn't adding the address of the peer to the swarm (and thus torequest_response::Behavior.addresses
) an option?
You mention in #5634 that doing it that way is annoying, but would you mind expanding why?
/// > **Note**: In order for such a dialing attempt to succeed, | ||
/// > the `RequestResonse` protocol must either be embedded | ||
/// > in another `NetworkBehaviour` that provides peer and | ||
/// > address discovery, or known addresses of peers must be | ||
/// > managed via [`Behaviour::add_address`] and | ||
/// > the `peer` must be [`DialOpts`] with multiaddresses or | ||
/// > in case of simple [`PeerId`] `RequestResponse` protocol | ||
/// > must either be embedded in another `NetworkBehaviour` | ||
/// > that provides peer and address discovery, or known addresses of | ||
/// > peers must be managed via [`Behaviour::add_address`] and | ||
/// > [`Behaviour::remove_address`]. | ||
pub fn send_request(&mut self, peer: &PeerId, request: TCodec::Request) -> OutboundRequestId { | ||
pub fn send_request<Peer>(&mut self, peer: Peer, request: TCodec::Request) -> OutboundRequestId | ||
where | ||
DialOpts: From<Peer>, | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current description and usage of peer
is a bit confusing IMO as it can be a peer id,multiaddress, or DialOpts
.
What do you think of adding instead a separate function:
pub fn send_request_with_dial_ops(&mut self, dial_ops: DialOpts, ...)
.
That way no From<&PeerId>
for DialOpts or breaking change on send_request
would be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed various options during community call and my understanding was that it is desirable to not have multiple methods and it is also desirable to keep it backwards compatible in terms of API. I do not have strong opinion though and happy to change if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why isn't adding the address of the peer to the swarm (and thus to
request_response::Behavior.addresses
) an option?
You mention in #5634 that doing it that way is annoying, but would you mind expanding why?
The logic is external to the swarm and implemented using a custom request/response protocol.
We have convenient abstractions that allow us to implement request-response protocols with a single async callback comparing to writing libp2p's behaviors.
It is not needed for anything except a single request, hence no inherent need to store it anywhere in the swarm.
If I was to add a one-time-use peer to the swarm I'd have to:
- implement a custom behavior that will actually store it
- Make a call into behavior before request to make sure request-response can make a dial if it needs to
- Remove address from the behavior right after this request in order to avoid leaking memory
This essentially means a ton of boilerplate that I do not need to do if I have the ability to simply specify the address to dial explicitly.
/// > **Note**: In order for such a dialing attempt to succeed, | ||
/// > the `RequestResonse` protocol must either be embedded | ||
/// > in another `NetworkBehaviour` that provides peer and | ||
/// > address discovery, or known addresses of peers must be | ||
/// > managed via [`Behaviour::add_address`] and | ||
/// > the `peer` must be [`DialOpts`] with multiaddresses or | ||
/// > in case of simple [`PeerId`] `RequestResponse` protocol | ||
/// > must either be embedded in another `NetworkBehaviour` | ||
/// > that provides peer and address discovery, or known addresses of | ||
/// > peers must be managed via [`Behaviour::add_address`] and | ||
/// > [`Behaviour::remove_address`]. | ||
pub fn send_request(&mut self, peer: &PeerId, request: TCodec::Request) -> OutboundRequestId { | ||
pub fn send_request<Peer>(&mut self, peer: Peer, request: TCodec::Request) -> OutboundRequestId | ||
where | ||
DialOpts: From<Peer>, | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed various options during community call and my understanding was that it is desirable to not have multiple methods and it is also desirable to keep it backwards compatible in terms of API. I do not have strong opinion though and happy to change if necessary.
The request response behavior already persist explicitly added addresses. You just need to call
rust-libp2p/protocols/request-response/src/lib.rs Lines 798 to 809 in b187c14
So there the reported address should then just be added to self.addresses , and when sending a request the address will then be used.Edit2: The request-response behavior does handle relevant swarm events and updates its addresses based on it: rust-libp2p/protocols/request-response/src/lib.rs Lines 798 to 799 in b187c14
|
There are multiple issues with that:
All for what? I just want to make a request and I happen to know the address to dial too, it is too much wasted CPU and brain cycled doing useless things comparing to what this PR offers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Nazar!
Thanks for the feedback, I wasn't aware of what @elenaf9 referred with Swarm::add_peer_address
and #4302.
But as I referred in the call we had, I'd like to follow #4302 and try to address this at the Swarm
layer instead of Behaviour
layer.
with that in mind and the topics you laid:
Those addresses are stored in PeerAddresses, which is initialized with defaults, meaning LRU with 100 entries (we sometimes dispatch A LOT more requests than that)
that is easily addressable right? We can expose choosing the size for the LruCache
inside PeerAddresses
via Config
There is still a need to add addresses there manually (which will also propagate to behaviors that I don't want it to propagate to)
this is interesting, can you explain why propagating to other hypothetical behaviours, (and which) would be a disadvantage to you?
There is no way to ever delete peers from that cache once .remove_address() is gone (already deprecated)
Can you also explain why removing would help you? #4302 doesn't mention it, but I don't see an objection to also remove addresses at the Swarm
level with a FromSwarm::RemovedExternalAddrOfPeer
event, which I also don't think would affect #4103 negatively
Thanks!
It is not exposed right now and generally I see LRU cache as a hack when you can't specify the exact logic explicitly and instead just slap a heuristic number on it and hope it will work out. To be clear, it is a reasonable way to reduce complexity, but it is not optimal either for specialized use cases like this one. It might work with large enough limit, but such limit will need to be correlated with other parameters elsewhere, increasing swarm configuration complexity and maintenance cost. It will also use more memory than it should and will use unnecessary CPU cycles in the process, not to say the data flow becomes much more convoluted than it needs to be on top of that.
The better question is why would I want to do that to begin with (except the design of request-response that mandates this right now)? The disadvantages are many:
I simply want for dial to not fail if the address of the peer is not already known to existing behaviors, just dial the explicitly provided address, as simple as that. I also can't have a shortcut and not add address and not pay the cost of triggering logic in behaviors by checking if established or pending connection is there without writing even more code and spending even more RAM and CPU cycles (I tried that initially and quickly abandoned because I had to store additional hashmaps with non-trivial logic).
As mentioned many times already, these addresses are 100% useless for anything except the request that I want to make. So if I want to keep memory usage under control, I need to clean up the things that I add after they are no longer needed (meaning after request-response request, which is the only purpose here). This gets even more complex though because I need to know if the address wasn't added by something else prior (or else I'll remove it unnecessarily) and handle potential concurrent requests gracefully as well, all of which is very non-trivial. If this PR goes through none of that is needed and indeed #4302 doesn't inherently require removing anything because it doesn't inherently require adding anything either, only current design of request-response does. I would even go as far as saying that this cache in request-response is not a feature, but rather a bug, it should be removed without replacement. libp2p users can store addresses anywhere in any way they want (including a simple sidecar behavior) and decide if/how they want to limit number of addresses (LRU, explicit removal, etc.) instead of this storage being forced only everyone unconditionally. |
Addresses are added to behaviors quite frequently because we generally build on the assumption that more "knowledge" about peer addresses is favorable. Compared to the workload and memory consumption of establishing a whole connection, I'd argue that reporting/ storing/ deleting a single address is rather cheap. For instance, the Behaviors can also just ignore the reported address, which most actually do. For request-response specifically, I think the address cache does make sense and should not be removed. IMO network behaviors should be self contained in the sense that it should possible to use a behavior on its own in the swarm, which wouldn't be possible for request-response when there is no way of storing an address. |
I see your point and agree it is not necessarily particularly expensive, but things that allocate (like LRU) are certainly not free and I would also prefer not to think about that. After working on performance-sensitive code for a while I try to avoid unnecessary overhead whenever possible, so I might be biased.
I see the value in current behavior, so was looking at it from the point of view of extracting that logic into a separate behavior that can be composed together with request-response rather than being forced onto users. It'll also help with debugging in some cases since LRU means connection might succeed in some cases, but not others, which is sometimes tricky to find.
Not really, it depends on how many requests are triggered before the dial is actually happening. If I queue over 100 requests to previously unknown peers, dial will annoyingly fail for some of them. Moreover the cognitive distance between where the address comes from and where it is used is way too large in that case, which hurts debugging. I already have an address ready to be used, the |
An abstract behavior for storing peer addresses (and other info) is currently WIP in #5724.
I completely understand that for you specific use-case this is the most straightforward solution. IMO we should only move forward with this PR if either a) the use-case can't be solved with the existing API (which I don't think is the case here, especially with #5724) or b) we have numbers that show that the theoretical performance impact can in fact be measured 🙂 |
I am more in favor of removing the logic completely. It is trivial to write a behavior that has either LRU or takes advantage of #5724 or something else. Opt-out means more complexity, not less.
IMO #5724 is completely orthogonal to what I need here. I don't need more behaviors and more complex data flow in the app, I need less behavior and logic that is easy to follow. I don't have anything meaningful to add to this conversation. Not all changes make sense upstream, it is totally fine if this PR is closed without merging, I can understand and respect that, certainly appreciate non-trivial amount of time dedicated to review and thinking about implications regardless of the outcome. In such case either libp2p fork or fork of request-response protocol will likely be the path forward for us, dealing with caches and additional peer store behaviors triggers way too many edge-cases as alluded to above and will in fact be harder to maintain than just forking libp2p, sadly. |
Description
We discussed #5634 at one of the community calls and the ways to approach it, one of the challenges we discussed was around the argument type and how to handle the case where
PeerId
is not specified inDialOpts
.I went with a relatively straightforward route: use
PeerId
when it is specified (same behavior as before this PR) and useConnectionId
otherwise, wrapping both into an enum to use as a key in internalHashMap
.Implementation ended up being fairly simple with the only breaking change being
peer
field inrequest_response::Event::OutboundFailure
changing fromPeerId
toOption<PeerId>
.Resolves #5634
Notes & open questions
One thing I do not like too much and only did for backwards compatibility purposes is additional
impl From<&PeerId> for DialOpts
such that existing calls continue to work. If changing&peer_id
topeer_id
is an acceptable breaking change then that impl can be removed.Change checklist