-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ws_transport): correct split header bytes (IDFGH-13859) #14706
base: master
Are you sure you want to change the base?
fix(ws_transport): correct split header bytes (IDFGH-13859) #14706
Conversation
👋 Hello bryghtlabs-richard, we appreciate your contribution to this project! 📘 Please review the project's Contributions Guide for key guidelines on code, documentation, testing, and more. 🖊️ Please also make sure you have read and signed the Contributor License Agreement for this project. Click to see more instructions ...
Review and merge process you can expect ...
|
I occasionally encounter an issue with the websocket where a semaphore is halted for 10 seconds. It seems that the underlying We need to conduct more tests. We will try to check and print the payload sizes to determine what is causing the issue. |
Could be related? I'm not familiar with when that semaphore is held. What file is that in? The problem I've run into is whenevr the next layer down's read function returns early(before the timeout with fewer bytes than expected). When that happens, transport_ws's parser gets off a few bytes, then may misinterpret the payload as header. Worst-case, the following payload starts with byte 126 or 127, which are RFC6455 codes for "extended payload length". When that happens, transport_ws may try to read a large(125B - 4GB) frame - if the other side isn't sending a lot of data, this may take approximately far too long for the read to complete. |
Are you using esp_websocket_client? It looks like that lock would be held across the call to esp_websocket_client_recv(), which can hang while parsing garbage. Are you using a binary payload by chance? That's likely to be worse - we're sending EngineIO, so our misinterpreted payload-as-header bytes are usually ASCII numbers. |
We are using text for both sending and receiving. The problem sometimes occurs during both receiving and sending, and we haven't been able to identify the source to reproduce it consistently. We can reproduce it by dropping the Wi-Fi connection and through other external actions, but this doesn't reveal the origin of the problem. |
cde804b
to
46bacc3
Compare
655317f
to
46bacc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, otherwise. Thanks for the fixes!
When the underlying transport returns header, length, or mask bytes early, again call the underlying transport. This solves the WS parser getting offset when the server sends a burst of frames where the last WS header is split across packet boundaries, so fewer than the needed bytes may be available.
46bacc3
to
093ea00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, LGTM!
Not ready to submit, prototype fix only. Posting for discussion.
Description
Prototype workaround for WS failing to parse when WS frames get split across a very specific timing and spacing pattern.
This fix is functional, but may rarely exceed the expected timeout.
I'm not sure if this is the right layer to fix it - perhaps the underlying TCP and TLS transports should be fixed instead to always wait up to the max timeout before returning failed bytes?
Related
Fixes #14704
Testing
To test, a printout was added when fallback is triggered. When we tried to make this failure more consistent, we could only get it to around 10% of the time. It's network-timing and traffic-burst specific.
Checklist
Before submitting a Pull Request, please ensure the following: