Leader runs out of buffer #8797
Replies: 4 comments 4 replies
-
It appears that MLE packets are getting stuck in the indirect transmission queue. There could be a number of things going on here. Some example possibilities:
Complete logs from both the FTD and MTD should provide more visibility into the issue. |
Beta Was this translation helpful? Give feedback.
-
Hello Jonathan, this issue has been reproduced on Nordic NRF52840 and Silabs BRD4166A, but not on NXP (which has a proprietary stack if I'm right). I will try to get logs from an end-device. I first thought it was an issue at platform level but as I reproduce the issue on concurrent platforms, it seems more to be OpenThread related. Also, I was able to locate a little bit more the origin of the issue, which always happens after a Tx retry:
As you can see, it tried to send the frame twice, but didn't retried a 3rd time. After that the device is not capable to transmit anymore (except announce and advertisement if these events occurs before the buffer are full). The number of retries is random, but it is always after that the problem occurs. |
Beta Was this translation helpful? Give feedback.
-
Hello, unfortunately the DEBUG level cause our platform to crash - too much data to print... I am also using a sniffer, so in the wireshark logs I see there were no 3rd try here. The 2nd frame attempt in the logs I gave is the last frame sent by our platform. The issue seems to be the same from this thread: https://groups.google.com/g/openthread-users/c/FGcMZSRjQfs It gives me some inputs but I'm not sure to understand the fix. |
Beta Was this translation helpful? Give feedback.
-
Hello Jonathan, Abtin, We found the issue on our platform, it was related to the config OPENTHREAD_CONFIG_PLATFORM_USEC_TIMER_ENABLE causing troubles on the alarm management (because of the to many CSMA backoff timer --> too much for our platform when a lot of traffic). We had it enabled at the beginning because of the line of code below because we had OPENTHREAD_CONFIG_MAC_CSL_RECEIVER_ENABLE enabled on our FTD:
But in fact the FTD doesn't need this config so we also removed it and now it is working nicely. So sorry for the disturbance and thank you for all your quick answers :) Eric |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am facing an issue concerning the platform we are developing.
On one side I a have a FTD (Leader), on the other side I have 10 children.
When I start all of them at pretty much the same time, the Leader quickly runs out of buffer. I get the following messages:
or
[0000185136] [REGION UNDEF] [I] MeshForwarder-: Prepping indir tx IPv6 UDP msg, len:83, chksum:5c6c, ecn:no, to:0xa41d, sec:yes, prio:net
[0000185137] [REGION UNDEF] [N] MeshForwarder-: Evicting IPv6 UDP msg, len:83, chksum:3477, ecn:no, sec:yes, error:NoBufs, prio:net
[0000185139] [REGION UNDEF] [N] MeshForwarder-: src:[fe80:0:0:0:2456:4fa5:b12a:173f]:19788
[0000185140] [REGION UNDEF] [N] MeshForwarder-: dst:[fe80:0:0:0:6478:529c:b841:3bf]:19788
The CLI command bufferinfo returns me the following:
So all the buffer are indeed taken, so the Leader can't store any new incoming messages, nor send any messages.
But it seems that it is more the consequence and not the root cause. When I see the free buffer beginning to go to 0, the leader as already stopped to answers solicitation (parent req, etc...) from the end devices.
I think as frame are not handled, they remain in the buffer, the end devices keep sending frame and we end up full of messages.
Do you have any idea what could cause the device to suddenly stop handling incoming frames?
I first thought of the alarms, the buffer handling at platform level, buffer size, RAM corruption, but it seems not to be that.
After the problem happens, if I do a
thread stop
, thenifconfig up
andthread start
, the buffer are freed again but are immediately filling up, no Tx possible, so it doesn't solve the problem.Beta Was this translation helpful? Give feedback.
All reactions