-
-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discovery items type error failure #253
Comments
@digitaldan I moved this one into its own issue. I am still seeing this error being generated a couple times every hour. Anyway, I did some research and I found a possible solution including I did some testing and the request module seems to defaults to Do you see any reason for not making this change? Direct Connection
Behind Reverse Proxy Server (nginx)
|
If that works i don't see a problem with it. I'm not sure why it works :-) afaik 'keep-alive' is used so the underlying TCP connection can be used multiple times by different requests and save having to create and tear down a TCP connection over and over for each request, browsers would be painfully slow without it. If the request is still in flight, and there is a pause in the data stream, i guess it's possible something like a nginx proxy could terminate it if the keep-alive is not set, so maybe thats what it is fixing? |
My understanding is that when sending keep-alive to the server, it will not close the connection after responding to the initial request, unless the client requests it. While the connection remains open, it can be reused and will timeout after a specific time when idling. On top of this, I believe keep-alive is enabled by default in HTTP/1.1. I think we should enable it/deploy and see if we still get that error. Do you see any reason why we shouldn't? |
I am totally good with it. |
So the connection keep-alive change I made didn't resolve this issue. After doing more research, I believe the issue is related to the fact that the myopenhab cloud connector nginx server is buffering every request/response before passing the data on. That buffer has a limited size and can become full at time when there are a lot simultaneous discovery requests in a short amount of time. My understanding is that when the buffer gets full, some of the data chunks may be dropped in flight causing partial response to be sent. Going over the nginx configuration committed in the cloud service github repository, I can see that the proxy_buffering off; Also, we could enable gzip for gzip on;
gzip_comp_level 6;
gzip_types application/json; @digitaldan What do you think? |
I'll take a look at our nginx config, i'm not 100% sure if we set that for a reason or it was copied from some default nginx proxy config example. |
So since we now have the request timeout logic deployed, we get the bearer token for the request that timed out in the logs. Based on my testing on a few of them, it seems that the request times out after 60 seconds when the myopenhab cloud connector nginx server times out with a "504 Gateway Time-out" error. Anyway, I was under the impression the cloud connector can track remote clients that are offline and directly respond to external requests with an offline message response. The reason I am bringing this up is because each of the failed requests is wasting 8 seconds of compute time. There are about 35 timed out requests across the regions per hour so far. Maybe there is something that can be easily done on the cloud connector side. |
According to this documentation, if Additionally, after doing some test on one of the affected users, I noticed that the proper JSON object was returned on subsequent calls. So we could introduce retry logic at that level if the items data isn't valid. |
So the change I included in #270 adding |
The retry logic is helping a little bit but there are still some requests that aren't able to get the complete items data after 4 tries. I don't think we can go around not tweaking the nginx configuration. Anyway, we need to add gzip support. One thing I noticed testing with two separate errors, the data gets always truncated at 48k (49152) no matter what the source is. This is a clear indication of some sort of buffering/paging issue on the server side. |
Based on this post, it seems that the timeout error may be caused by the fact the user cloud connector binding may be only setup for notification access mode while having the remote access disabled. Is this something that can be determined at the cloud connector level so it can directly return an error instead of waiting until reaching timeout? |
Per the skill logs, several users discovery requests are failing due to the error below
After enabling debug mode, the issue seems to be related to the items data value being a string because the request call seems to only have read/received a partial response preventing the expected JSON object from being parsed properly.
The example based on the reference below isn't that considerably big in size, although it's just partial data, and the Lambda execution time for that request was about 5 seconds.
Lambda request log reference
The text was updated successfully, but these errors were encountered: