Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use truncated exponential backoff for reconnection #272

Merged
merged 1 commit into from
Oct 16, 2023
Merged

Conversation

smortex
Copy link
Member

@smortex smortex commented Sep 26, 2023

When the communication with the Riemann server is connected (TCP/TLS) and the link breaks, riemann-wrapper drop the events that failed to be send, log a warning, and immediately try to reconnect and proceed with remaining data.

When the Riemann server is unreachable because of some network connectivity issue, this new connection will likely immediately fail, freshly gathered events will be dropped, a new warning will be logged, and a new connection will be tried immediately.

Because we do not wait before reconnecting, we log an unexpectedly large amount of information about dropped messages, and because of the delays introduced by the reconnection attempts, we might be sending stale data when the connection succeed again.

Rework the disconnection detection logic to apply some truncated exponential backoff when the connection dies. Sleep at least 0.5 and at most 30s between attempts, and drop any pending events before trying to reconnect.

@smortex smortex added the enhancement New feature or request label Sep 26, 2023
When the communication with the Riemann server is connected (TCP/TLS)
and the link breaks, riemann-wrapper drop the events that failed to be
send, log a warning, and immediately try to reconnect and proceed with
remaining data.

When the Riemann server is unreachable because of some network
connectivity issue, this new connection will likely immediately fail,
freshly gathered events will be dropped, a new warning will be logged,
and a new connection will be tried immediately.

Because we do not wait before reconnecting, we log an unexpectedly large
amount of information about dropped messages, and because of the delays
introduced by the reconnection attempts, we might be sending stale data
when the connection succeed again.

Rework the disconnection detection logic to apply some truncated
exponential backoff when the connection dies.  Sleep at least 0.5 and at
most 30s between attempts, and drop any pending events before trying to
reconnect.
@smortex smortex marked this pull request as ready for review September 26, 2023 23:25
@jamtur01 jamtur01 self-requested a review October 16, 2023 12:35
Copy link
Member

@jamtur01 jamtur01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jamtur01 jamtur01 merged commit 87fb130 into main Oct 16, 2023
8 checks passed
@jamtur01 jamtur01 deleted the backoff branch October 16, 2023 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants