Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: circuit-relay connectivity fails sometimes #1143

Open
chaitanyaprem opened this issue Jun 27, 2024 · 2 comments
Open

bug: circuit-relay connectivity fails sometimes #1143

chaitanyaprem opened this issue Jun 27, 2024 · 2 comments

Comments

@chaitanyaprem
Copy link
Collaborator

chaitanyaprem commented Jun 27, 2024

Describe the bug
During dogfooding session, a node trying to connect to a circuit-relay address of another desktop node failed to connect first time, but subsequently succeeded in connecting in 2nd try.

To Reproduce
Steps to reproduce the behavior:

  1. run a status-desktop node n1 which doesn't have public port exposed(due to upnp failure or something else)
  2. Get its addresses and note down circuit relay address
  3. connect from another status-desktop n2 to n1 using its circuit-relay address.
  4. tail geth.log in node2 grepping peerID of node1
    tail -f geth.log | grep <n1 peerID>
  5. Observe that connection happens at first and then n1 disconnects after trying to initiate metadata on its own. Note that metadata from n1 to n2 req/resp would have happened successfully.

Expected behavior
Circuit relay connection should succeed always.

Logs of 2 tries where first time connection failed after some time. Second time connection was successful

failed_circuit-relay.log

As per logs, n2's address is 16Uiu2HAkwZiPPrXBPuFg2HthDYm6zL5hyvLsVmS22UossWNcLDGX

Possible issus:

  1. either with nwaku node that is acting as relay-server which is flaky
  2. or sequence of events the receiving go-waku node n2 caused this failure

cc @richard-ramos

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Jun 27, 2024

Wondering if this blocking code of event handler is causing some issue, maybe we should be running this in a go-routine.
WDYT @richard-ramos

ctx, cancel := context.WithTimeout(pm.ctx, 7*time.Second)
defer cancel()
if err := pm.metadata.DisconnectPeerOnShardMismatch(ctx, peerEvt.PeerID); err != nil {
return
}

Also noticed that second time when connection succeeded, PeerJoin event was not received from underlying layer which did not trigger the metadata connection.

@richard-ramos
Copy link
Member

Could be.

Looking at the implementation of the events in go-libp2p p2p/host/eventbus/basic.go, i see that subscriptions by default have a buffer of 16 to avoid locking, but i'm probably missing the full context on why this lock could happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants