Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support keep alive property #32

Open
raytung opened this issue Mar 3, 2022 · 6 comments
Open

Support keep alive property #32

raytung opened this issue Mar 3, 2022 · 6 comments

Comments

@raytung
Copy link
Member

raytung commented Mar 3, 2022

Hey 👋

There was a change to Fluentd's out_forward awhile ago to support keep alive behaviour fluent/fluentd#2393. Is there any chance of implementing this in this client as well? The use case is identical to the original issue in fluent/fluentd#2188 where we'd like our clients to rebalance its connections over to newly spawned fluentd replicas.

@jamiees2
Copy link
Collaborator

jamiees2 commented Mar 3, 2022

Hey! Just to make sure I understand the feature request correctly, you would like to pass a parameter to the socket that enables TCP keepalive on the socket?

I'm not super familiar with this feature, we don't use it in our production, so just wondering if anything else is needed.

@raytung
Copy link
Member Author

raytung commented Mar 3, 2022

Hey @jamiees2 thanks for your reply. The property name is kind of confusing, while it's called "keep alive" it's not really related to TCP keep alive. It's more akin to "tcp connection refresh interval".

To describe the problem a bit more in details, say we have something like this

Client replica [1...N] ---------> Load balancer ------> Fluentd replica A

In the event where fluentd is underload, we would scale it horizontally, say add 1 more replica. However, adding an extra fluentd replica doesn't currently help rebalance the load since our existing clients are still connected to the original fluentd replica.

The out_forward plugin's solution to this problem is basically adding a parameter to periodically disconnect and reconnect to upstream fluentd, thereby balancing the load overtime.

@raytung
Copy link
Member Author

raytung commented Mar 3, 2022

For what it's worth, when we tried simulating this behaviour by having a basic setInterval(() => { client.disconnect().then(client.connect()); });, we found that this client crashes our NodeJS process. I've created a separate ticket to track that issue #33

@jamiees2
Copy link
Collaborator

I've been thinking about how it's best to implement this, this is made a little complex by acknowledgements, because we don't want to miss an acknowledgement during this phase. I think what we ideally want to do is:

On an interval, close the socket for writes
If acks are enabled, wait for the ack queue to empty out
Once acks have been completed, shut down the socket
Reconnect the socket

I think this can be implemented with a new state, similar to IDLE, but I'll need to think through this a bit. I'm going to cut a release though, with your fixes, so that you're able to use the simulated behavior :)

It might mishandle acks, and might conflict with the reconnection behavior, but if it works for you, that's great!

@jamiees2
Copy link
Collaborator

I wonder if there are other approaches we're missing here, for example we could instead of a hard timeout have a count limit, which prevents us sending more than X messages without reconnecting the socket. Would we consider this more flexible?

Actually, I guess we could implement both in the same way, by having a setInterval that sets a flag

@raytung
Copy link
Member Author

raytung commented Mar 16, 2022

On an interval, close the socket for writes
If acks are enabled, wait for the ack queue to empty out
Once acks have been completed, shut down the socket
Reconnect the socket

Thinking out loud: One potential area to look out for is in the event where upstream fluentd is unresponsive, the acks may never come back, which means we'll be stuck waiting. Perhaps we could rely on a short ack timeout in this case, and if ack timeout we shutdown the socket and reconnect?

Another avenue to explore is probably to straight up ignore acks when we're reconnecting, we re-emit the events in ack queue after we reconnected. Would this make the implementation simpler? Looks like this is what Fluentd's own out_forward does https://github.com/fluent/fluentd/blob/2036c4a6d791de24165d9575243eb71e5914b3b8/lib/fluent/plugin/out_forward/socket_cache.rb#L70-L95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants