Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use correct ticker for all peers ping #1214

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

chaitanyaprem
Copy link
Collaborator

@chaitanyaprem chaitanyaprem commented Sep 4, 2024

Description

while going through code, noticed that randomPeersTicker was being over-ridden by allpeersTicker.
this causes peers to be disconnected forcefully every 5 minutes in relay mode

@chaitanyaprem
Copy link
Collaborator Author

while i was trying to capture peer counts and monitor bandwidth usage locally in relay mode, i had noticed peer counts drop to 0 every 5 minutes. while looking in the code for timers that are 5 minutes found that allpeersPingtimer is 5 mins and once i fixed this issue, i am not noticing peer counts drop to 0.

Can't figure out if really peer counts are going to 0 or something else is happening. any thoughts @richard-ramos

Here is the graph of running my desktop instance for about 6 hours and notice that towards the end peer-counts don't drop anymore (after applying this fix).

Screenshot 2024-09-04 at 1 17 33 PM

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Sep 4, 2024

i think i figured out what is happening without this fix, since ticker is being overridden by wrong timer below logic assumes peer ping failed since it has been too long and forcefully disconnects all peers every 5 minutes. This bug causes relay node to keep loosing connectivity every 5 minutes and re-establish connections with peers and will cause message reliability issues.

case <-randomPeersTickerC:
difference := w.timesource.Now().UnixNano() - lastTimeExecuted.UnixNano()
if difference > sleepDetectionInterval {
lastTimeExecuted = w.timesource.Now()
w.log.Warn("keep alive hasnt been executed recently. Killing all connections")
disconnectAllPeers(w.host, w.log)
continue

@richard-ramos @kaichaosun @plopezlpz is there a way we can get this fix released into status desktop patch version? otherwise all users will be facing constant disconnections (every 5 minutes they will loose all peers) and hence message reliability issues.

Logs while running status-desktop in relay mode without this fix show the same.


WARN [08-20|09:19:23.652|github.com/waku-org/go-waku/waku/v2/node/keepalive.go:87] keep alive hasnt been executed recently. Killing all connections 
WARN [09-04|09:28:04.044|github.com/waku-org/go-waku/waku/v2/node/keepalive.go:89]  keep alive hasnt been executed recently. Killing all connections 
WARN [09-04|09:33:04.046|github.com/waku-org/go-waku/waku/v2/node/keepalive.go:89]    keep alive hasnt been executed recently. Killing all connections 

Copy link
Contributor

@kaichaosun kaichaosun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is release branch for go-waku and status-go, cherry pick the commit and maybe tag QA team for version upgrade possibilities.

https://github.com/waku-org/go-waku/tree/status-releases/2.30.x

@chaitanyaprem
Copy link
Collaborator Author

There is release branch for go-waku and status-go, cherry pick the commit and maybe tag QA team for version upgrade possibilities.

https://github.com/waku-org/go-waku/tree/status-releases/2.30.x

since 2.30 is released, was not sure if this branch is still used or not.

@richard-ramos
Copy link
Member

Lets cc: @iurimatias, because i'm not sure what's the approach used in desktop for patch releases!

@iurimatias
Copy link
Collaborator

Lets cc: @iurimatias, because i'm not sure what's the approach used in desktop for patch releases!

We follow semver, we can do a 2.30.1 release for this (after testing), starting with 2.30.1-rc1.

@chaitanyaprem
Copy link
Collaborator Author

Lets cc: @iurimatias, because i'm not sure what's the approach used in desktop for patch releases!

We follow semver, we can do a 2.30.1 release for this (after testing), starting with 2.30.1-rc1.

thanks...is there a branch in status-go and status-desktop already for 2.30.1 or we need to create one for this alone? if there are any other important fixes, we can include them as well and then make a 2.30.1 release.

@iurimatias
Copy link
Collaborator

Lets cc: @iurimatias, because i'm not sure what's the approach used in desktop for patch releases!

We follow semver, we can do a 2.30.1 release for this (after testing), starting with 2.30.1-rc1.

thanks...is there a branch in status-go and status-desktop already for 2.30.1 or we need to create one for this alone? if there are any other important fixes, we can include them as well and then make a 2.30.1 release.

I'll prepare a PR / branch after a meeting. the QAs will need to test it etc..

@iurimatias
Copy link
Collaborator

@chaitanyaprem @richard-ramos is this going to be merged first? or should I point it to this branch to test?

I like @kaichaosun suggestion to merge it to release/2.30.x (besides master) it makes everything more consistent (and stable), I recommend that

@iurimatias
Copy link
Collaborator

I'll create the PR for release/2.3.x

@chaitanyaprem chaitanyaprem merged commit 3066ff1 into master Sep 4, 2024
15 checks passed
@chaitanyaprem chaitanyaprem deleted the fix/keepalive-ticker branch September 4, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants