-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backpressure handling #175
base: master
Are you sure you want to change the base?
Conversation
The message channel full procedure can be handled by native go select syntax. // func (t *InputModule) Start(ctx context.Context, msgChan chan<- logevent.LogEvent) (err error)
select {
case msgChan<-msg:
// send log event normally
default:
// the message channel is full
} |
I don't think that pattern is good enough. There is no way for the output to signal back that there are some issues that needs attention, and the select pattern does only cover the case where the channel is full and is blocking. Let's use the HTTP output as an example. There are several outcomes from the Output():
In either case gogstash will continue to process messages and they will be lost. The problem with the input is that it does not know when it is ok to continue to receive messages and will just have to retry at random intervals. While thinking, should there also be a Requeue() to allow the output to reschedule the event at a later time in case of issues? |
I hope you see why I think we need a better kind of backpressure handling. I have been thinking and I think it can be cleaner to move RequestPause() and RequestResume into output.OutputConfig - and RegisterInput() into config.InputConfig, still using CanPause so the developer get some freedom in how to implement it. This way it looks more «integrated» into gogstash. I can write a new proposal on this if you want. |
Output modules can retry forever to prevent data lost. In another word, the error handling can be done in output modules. My questions:
|
Hi I disagree that the output modules can retry forever. At some point they will either drop the event or gogstash runs out of memory. In my case am I using gogstash to handle around 180k events per second, and in case an output tries to queue everything it will not take long before I run out of memory and gogstash crashes. If we look at the HTTP output. It will drop the message right away (or after a timeout), reporting an error. The elastic output will not report an error if the message has been received by olivere, but asynchronously just log that there has been an error. The file output will drop the event if it for some reason cannot write to the file. For your questions;
For the inputs;
Implementing this is not an easy task as each input that we work on will take some time. My intention is to start on the framework and work on the inputs/outputs I know. Just make sure it works before we move on. I also mentioned a Requeue() event. Providing something like this makes it easier for each output to requeue an event in case of a retry-able error, and have a common way to handle such errors. |
Signed-off-by: tsaikd <[email protected]>
50d72c4 does the interface fit your requirement? |
I had a quick look and it seems to work. I suggest we just need to give it a try and get some experience. |
Signed-off-by: tsaikd <[email protected]>
Signed-off-by: tsaikd <[email protected]>
Is it something else you want me to do here? |
Signed-off-by: tsaikd <[email protected]>
Up to you, keep it for discussing or close it. Could you help to check daf1889 is working or not? |
Hi Sorry for the late answer. I have looked more into your control branch and implemented correct handling on two inputs. in #178. I have tested both inputs by modifying outputstdout (that change is not commited) to pause for a few seconds after each message received. Would it be possible to merge my change in and commit it to master? |
We discussed this a bit in #135. This is my proposal for an interface that can be used to handle backpressure.