-
-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HomeBridge Ring failing multiple times a day #1502
Comments
Hi, thanks for opening and issue and providing logs. The logs clearly show errors with the network connectivity. For the most part, having some errors polling the Ring API is normal, and not really indicative of a problem per-se, Ring API servers are sometimes slow to respond and the requests timeout but will just try again later, so having some of these messages is "normal", for example I see anywhere from a dozen, to 40-50 such messages in a day in my own installation. Certainly you seem to be having a higher than normal failure rate, but these would not cause the plugin to fail to function, unless they were failing every single time, and, even then, that would not likely impact alarm functions, which use a websocket for communications. It's important to note that the plugin is not somehow doing something super special here, it is calling the fetch API built in to NodeJS and asking it to retrieve the URL. The fetch API either returns the data, or an error if it can't retrieve the data. If the fetch API fails to retrieve the data, there's nothing for the code to do except report the error and try again. However, there is another interesting tidbit in the log, specifically this part:
So this is the websocket connection I mentioned above, all communications with Ring Alarm uses this websocket, not the HTTP requests. However, this log entry indicates that the websocket lost connection and had to be re-established. Again, this happening every now and then is no big deal, but if you are seeing this often, it's just another indicator that something is happening at the network layer. Note that it usually takes lost connectivity of longer than 60 seconds before the websocket will detect missed connectivity and restart, so this indicates significant network issues. In summary, you have fetch API failing often, which indicates network issues, and websocket API disconnected, which indicate network issues. You will need to try to troubleshoot what network connectivity issue you are having. I would check the system logs on the host running Docker as a place to start, especially of this host is running in a VM based on KVM hypervisor (something like Proxmox, etc) as that recently had a known issues that was causes lots of dropped packets and created a lot of similar networking issues for users of ring-mqtt, which uses the same API. Alternatively, you can capture a packet dump with some tool like tcpdump or Wireshark. |
Another suggestion, make sure you have a local, caching DNS server. I've been surprised to find some users don't configure a local DNS, but just point all their client to some upstream DNS server, however, ring-client-api creates a LOT of DNS queries, and I've seen cases where this can flood the uplink with too much bursty traffic and some requests will fail. Usually this is for cases with DSL or some other asymmetric network where the upstream bandwidth is significantly lower than downstream. |
Thanks for the quick response. I have a local DNS and all good there. I get that there may be occasional fetch misses, but this was not a problem on previous versions of the plug-in (pre 13, although I think 11.x was the best). Secondly the biggest concern I have here is that there appears to be a leak of some kind as the CPU and memory climbs very rapidly to the point the server is unresponsive. Restarting the Ring child bridge fixes it so it is clearly the culprit. I will look into the network issue, but I do not believe that is the problem here, recovering from it seems to be. |
Previous versions of the plugin didn't use fetch, because it didn't exist in NodeJS until v18, rather there were dependencies on other HTTP request libraries, such as got, of which there have been dozens, if not hundreds of similar issues reported over the years. However, comments such as "I think 11.x was the best", kind of show that it's not related to this because guess what, the code for the alarm portion is nearly 100% unchanged in 13.x version. While we have bumped major versions, due to required breaking changes in ring-client-api, on which many other projects depend. the actual changes to anything alarm related have been very, very small. Even for cameras, the only real changes are with the streaming API being used (Ring deprecated the old one) and the switch to new style FCM API for push notifications (cameras and intercoms only). In other words, while you perceive that somehow 11.x was "the best", the actual core that does things like request API or maintain the websocket connection is about 99% identical between 11.x and 13.x. Yes, there was a fundamental change to use native fetch vs got in v13, theoretically, this might cause some problems, but again, ring-client-api is used across a wide variety of projects with, collectively 10's of thousands of users, the number of reports of such issues, with the exception of HOOBS because it has outdated NodeJS which doesn't support fetch, is near zero. Of course that is not to say there could not be some problem, but the only way to get to said problem is to collect data. If ring-client-api calls the NodeJS fetch API, and fetch reports that it couldn't get the data, what code can be changed to fix that on our side? If the OS reports that the TCP connection for the websocket is closed, what more can we do? Also, I'm not just a maintainer of this project, I'm also a user, I have the plugin running on my network with zero issues even after weeks. Without being able to reproduce issues, or users willing to dig in and collect detailed data, there's not much we can do about it, I mean, it's even possible that it could be some issue with the updated push-receiver which we had to use in v13 to work with modern push notifications, but that's an upstream project so we'd need to see evidence of an issue. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I have the same issue after updating the plugin to 13.1.0 and switching to unbridged cameras, so unfortunately I can´t say which of these two changes caused the errors. Would it help if I turn on the debug logging and provide the logs, let´s say tomorrow? [10/5/2024, 5:42:37 AM] [Ring] Failed to reach Ring server at https://api.ring.com/clients_api/ring_devices. fetch failed. Trying again in 5 seconds... |
@benjorito What actual functional problem do you have? Occasional failures to Ring API are to be expected as long as they are not continuously reported every minute there's really nothing unexpected here. There was nothing logged from 5:43AM until 6:09AM which implies that this is just normal random failures as there would be plenty of successful requests in the interim. |
@markcarroll I spent a bit of time going through your logs as well as looking at my own setup. I'm pretty convinced that, whatever issue you are having, it is not related to the log messages, I believe the messages in the logs, at least for your case, are red herrings. First, it's important to note why there are regular calls to This API is not really designed for frequent polling, the Ring apps poll it no more often than every 60 seconds and, in most cases, will only poll it on an interactive bases, such as opening the settings of a camera/chime/intercom or performing a manual refresh of such a page. For use in ring-client-api we need to keep device states updated so the API is polled every 20 seconds and returns the current state for all camera/chime/intercom devices. Note that this is not used at all for alarms devices/sensors, or for motion/ding notifications, it is only for other states on these devices. Because this API is not intended for such frequent polling, occassional failures are expected, and have no real side effects except that the state is updated later. I've noticed that some people love to set camera polling to far faster intervals, but this will just lead to more messages as the state is only updated on the backend every 30 seconds or so anyway, it's certainly not immediate. The point is, these messages are expected to occur sometimes and, unless you are seeing them every 15 seconds (which shows even the retries fail), they are not an issue. Also note that there are many other API calls made as part of ring-client-api, such as when a push notifications is received, and those are not failing, based on the logs (if they failed, they would also be logged). You never really stated what issue you are having, you state messages about arming/disarming works, but then you say "any interaction" will fail, and associate that with these failed fetch messages, but these polled APIs are not even used for alarm devices. So I guess what I'd like to ask is, what actual problem are you having? Can you please provide logs when you attempt to something such as arming/disarming? It's just not possible for me to see how arming/disarming could have anything to do with the API log messages at the moment since alarm devices use the websocket. The logs even show that camera push notifications and snapshot updates are still working. There's really nothing to indicate that the plugin is not working. I had one similar report with ring-mqtt, where two different users complained of random failures and high CPU or unexpected restarts. The issue turned out to be related to lack of memory (it seems NodeJS 20 uses a bit more RAM than prior versions), simply adding an extra 1GB of RAM to the VM caused the issue to go away. In another case, a user was having random failures communicating the API, and this turned out to be caused by some snort rules on his OPNsense firewall. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
@pcziesch I would prefer if you open a new issue and provide full requested details, including full logs during startup. I will mark your posts here as offtopic so as not to confuse them with the original issue. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Can someone who is experiencing this issue please enable debug in the plugin configuration and post logs from that? It should provide a complete stack trace of the failure. |
@tsightler sorry it took so long for me to reply. The issue I am having is that after a few hours my HomeBridge instance cranks up to near 100% CPU and 100% RAM usage. I am relatively sure it is the Ring plugin at this point since resetting the Ring child bridge causes everything to drop back down to normal levels again. During this time when it is maxed out, nothing works. I only started noticing this around the time of the v12 update although I cannot be sure exactly when. The effect is that once I restart HomeBridge, or the docker container (Which is often what i have to do since HomeBridge UI wont respond) everything goes back to normal. I have noticed that it seems to get worse once I interact with the Ring alarm. So, for example, things are working well all day. I turn on my scene at bedtime that switches the alarm to Home mode, which works. However when I get up in the morning and try to turn off the alarm it will not respond and I open the HB UI to find CPU and memory has shot up to near max. Same if I restart it at night and then use it to successfully turn off the alarm in the morning. It will most likely not be ok by the time I come to use it in that next evening. |
@markcarroll Thanks for your response. Can you please provide some additional details about your deployment. How is Homebridge deployed. What is the host that it is running on and how many resources does it provide. How many Ring devices/cameras do you have? The code in question is in use in more than 10,000 installs with only a handful of such reports so I'm trying to understand what might be unique that is triggering the behavior as, overall, there are very few differences between 12 and 13. The major version jump was primarily due to a breaking API change with notifications, not due to major changes in the code. |
It is running on docker on a raspberry Pi. Not much else on there right now as I tried to isolate the instance to debug it a bit. There had also been no changes to the device between it working great and the current situation, so reluctant to change too much there. I have
LMK what else you need |
What model RPi? How much RAM? Which RPi OS version? 32 or 64-bit? Which Docker image are you using? Your Ring setup is reasonably similar to mine. I have an alarm with slightly less sensors, with a Chime and 8 total cameras that are a mix of Floodlight Pro cams, stickup cams, and a wired Doorbell Pro (I have no battery cams). I'm running a test setup on a RPi3, so only 1GB, and it all seems to work fine for weeks with no issue using latest Raspberry Pi OS 12, 64-bit version, with homebridge standard docker image. This is why it is so difficult to understand what might even be the issue. Without being able to reproduce it, there's very little to go on. |
This is why it is so odd that it is eating up all the RAM. |
OK, I have that exact model in the closet, so I'll pull it out and set it up. Not sure that it will help, but worth a spin I guess. I'm quite honestly not sure what to even try, there's very little things that it can be, IMO. I wish I could somehow get access to a misbehaving system. |
Appreciate the effort, but before you spend any more time on it, i think this weekend I might tear it down and set up a fresh instance. This build has been running through many updates and plugins so starting afresh might help clear out any anomalies. |
OK. I would be interested in what OS version specifically, and what Docker version. My setup is running latest RPi OS version based on Debian 12, and latest Docker community edition (not stock from repo). Other interesting questions: Are you using wifi or ethernet connection? There has been at least one case where a firewall was somehow triggering on the requests, so they would work for a bit and then be blocked. That user had to add an exception to the intrusion prevention rules on their firewall. |
Is there an existing issue for this?
Describe The Bug
Getting "Failed to reach Ring server at https://api.ring.com/clients_api/ring_devices. fetch failed. Trying again in 5 seconds..." very frequently. It makes HomeBridge unusable (maxes CPU and memory) until I reboot the container. The HomeBridge docker container is running Node 20.
Note that rebooting doesn't actually fix the Ring plugin for long. I can change alarm state etc briefly after a reboot, but a couple of hours later it is broken again. Turning the alarm on or off will work immediately after a reboot, but given a couple of hours, any interaction with Ring will cause the error. It seems to put Homebridge into 9x% CPU and fills up memory so a full container reboot is needed.
To Reproduce
Just leave HB running. Container logs will show list of errors.
Expected behavior
It works like it used to.
Relevant log output
Screenshots
No response
Homebridge Ring Config
Additional context
homebridge-ring.log.txt
OS
Ubuntu Jammy (22.04.4 LTS)
Node.js Version
v20.17.0
NPM Version
?
Homebridge/HOOBs Version
Homebridge v1.8.4 · UI v4.57.1
Homebridge Ring Plugin Version
v13.1
Operating System
Docker
The text was updated successfully, but these errors were encountered: