-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make StatsRelay detect if StatsD Daemons are Alive #2
Comments
Initially I was thinking about using something like mon to periodically check if the statsd backend's are up (port check against the statsd admin port?) and if mon detects that a statsd host is down then restart the statsrelay daemon(s) and leave out the host which is down - when it comes back online then restart the statsrelay daemon(s) and include the host again. That said, I really like your idea of including this kind of functionality into statsrelay. Some ideas/thoughts/2c from my side:
Unfortunately I am not much of a programmer.. and my coding kung fu is very weak but will help out with as much as possible on the testing side of things! |
I would suggest creating a fixed size memory buffer that just gets overriden. I would also couple that with some sort of a timeout. So buffer X MB of metrics, for Y seconds. Y would be the TTL before you removed the node from the ring and just started sending metrics to another node (as noted above, the least bad situation). When the node comes back up, flush the buffer to the previously used node, add the now up node back to the ring. |
The current code base does nothing to detect or react to StatsD daemons that are not alive. The UDP StatsD protocol is designed to be fire-and-forget and offers no way to detect if the other side has received the packet.
StatsD daemons have a TCP administrative interface that's probably very useful for checking if the process is alive. That may be of help with this issue.
Things to think about:
The text was updated successfully, but these errors were encountered: