Replies: 26 comments 16 replies
-
You need to clean every object file (and library/archive files) if you change this configuration between two builds. |
Beta Was this translation helpful? Give feedback.
-
Been there done that, waiting for the T-shirt. |
Beta Was this translation helpful? Give feedback.
-
@TD-er can you post the full crash dump? The EXECVADDR is important as it says what the actual faulting read address was. If it's in 0x00000 + epsilon, you have a nullptr problem. If it's inside the IRAM area, then it's something else. The referenced line of code in the crash is a 32b read which should work just fine in IRAM, but will crap out if you have, say, an odd address or are in an illegal region (i.e. 0x00000...). |
Beta Was this translation helpful? Give feedback.
-
I just have another one...
Not sure if it is any usable, but just to be sure I saved the .pio build output |
Beta Was this translation helpful? Give feedback.
-
That's a different signature than your original report. Looks like random memory corruption and not String related here. Address |
Beta Was this translation helpful? Give feedback.
-
Yep it is indeed a different signature and the more I think of it, I guess it may indeed be some kind of corruption. |
Beta Was this translation helpful? Give feedback.
-
Do you have a list of what addresses can be considered valid?
And to make matters worse, I completely cleared all PIO files (cached, packages, .pio dir, etc) only to see similar crashes but now from a different call.
Edit: I added this to the functions
And it doesn't crash anymore. |
Beta Was this translation helpful? Give feedback.
-
OK, it does not seem to be related to the 2nd heap, but more something in the current staging code as I now also have it happening when using this PIO environment:
Edit: web_server.setContentLength(CONTENT_LENGTH_UNKNOWN);
web_server.sendHeader(F("Cache-Control"), F("no-cache"));
if (origin.length() > 0) {
web_server.sendHeader(F("Access-Control-Allow-Origin"), origin);
}
web_server.send(200, json ? F("application/json") : F("text/html"), EMPTY_STRING); |
Beta Was this translation helpful? Give feedback.
-
@earlephilhower I just posted some thoughts too in this issue: https://github.com/platformio/platform-espressif8266/issues/252#issuecomment-868421793 I'm not entirely sure what is causing this, but for sure there is something wrong in how the code is being built. |
Beta Was this translation helpful? Give feedback.
-
@TD-er I'm not sure what to say. It looks to me like a runtime data corruption issue, not al linker one. You've posted quite a few traces (please include the EPC= and EXECVADDR line or it's kind of useless) with varying PCs. All the execvaddr that's been posted are fishy and of the form 0xAAAA0000 where the AAAA part is random garbage and the lower 16bits are 0. My first impression is that you've got something in your code that's writing past the end of its buffer somewhere and due to the way the heap/stack are now it's become fatal. W/o a consistent failure and MCVE, there's not much we can do here, unfortunately. |
Beta Was this translation helpful? Give feedback.
-
Having dealt with a oob array access (with vector operator[], where it also does not check the bounds), adding -DDEBUG_ESP_OOM and -DUMM_POISON_CHECK to the Core build flags is a pretty good way to find the root cause, since it will crash at a much earlier time when the offending allocation happens, and not some time later when it happens to create something else in that memory region and show nonsense stack traces. With PIO, it is build_flags = ..., Arduino IDE has an option in the menu for OOM debug |
Beta Was this translation helpful? Give feedback.
-
Given this new page, https://arduino-esp8266.readthedocs.io/en/latest/ideoptions.html
|
Beta Was this translation helpful? Give feedback.
-
@mcspr Just had pointed out something in my code which could cause some excessive allocations. I am testing it now, so maybe that would make these crashes a lot less likely... (fingers crossed) See: letscontrolit/ESPEasy#3693 And if there is something to help me debug these OOM issues or stack overflows, then I'm all ears. |
Beta Was this translation helpful? Give feedback.
-
Nope... still happening, but the web server does respond even faster than it did before, so that issue at least did help a bit, but not in stability.
N.B. this is staging code, SDK222 and thus no 2nd heap or other MMU tweaks. Edit: |
Beta Was this translation helpful? Give feedback.
-
@d-a-v see https://docs.platformio.org/en/latest/platforms/espressif8266.html#debug-level, PIO's platformio.ini is a project-level config file that can change the build flags for Core + libs (build_flags) or project files (src_build_flags), so we only need to know which flags to use for what purpose |
Beta Was this translation helpful? Give feedback.
-
@paulocsanz Do you have some concrete example of code that's now causing issues? |
Beta Was this translation helpful? Give feedback.
-
OK, I've squashed a number of bugs including a rather fishy assembly implementation of a mutex I used. (found by @mcspr ) For example:
This function is called with a |
Beta Was this translation helpful? Give feedback.
-
@TD-er Please, give me a code to try it on my ESP8266 Exception (28): LoadProhibited: A load referenced a page mapped with an attribute that does not permit loads |
Beta Was this translation helpful? Give feedback.
-
Not tried it (as I'm not behind a computer with access to a compiler), but it should be something like this: typedef std::map<String, int> myMap_t;
myMap_t myMap;
void crashtest() {
{
HeapSelectDram ephemeral;
myMap[F("bla")] = 1;
}
{
HeapSelectIram ephemeral;
const String key(F("bla"));
const String key2(F("bla2"));
auto it= myMap.find(key);
if (it != myMap.end()) // Make sure the compiler doesn't remove the find call
myMap[key2] = 1;
}
{
HeapSelectDram ephemeral;
auto it= myMap.find(F("bla2"));
if (it != myMap.end()) // Make sure the compiler doesn't remove the find call
myMap[F("bla")] = 1;
}
} |
Beta Was this translation helpful? Give feedback.
-
It works without errors |
Beta Was this translation helpful? Give feedback.
-
I was wondering as it seems that this is crashing seemingly random, if it may be related to this build flag:
@earlephilhower Is this a plausible cause for crashing? |
Beta Was this translation helpful? Give feedback.
-
No, that's not what's going on. All the crash reports you've shown point to the app corrupting a String's data pointer into a (16-bit,0x0000) value, or some other structure's data points. Nothing related to strlen_P or anything String is shown, your app is just overwriting the String's internal data structures and at that point things go boom, in the one strlen_P case you had. In the other, it was another pointer that had been overwritten in the same XXXX0000 way. Without MCVE, there's still nothing we can do I'm going to move this to a discussion since as of now there's nothing pointing to a core issue. If you find something that's repeatable we can move back or open a new bug w/MVCE/etc. |
Beta Was this translation helpful? Give feedback.
-
OK, just a small update here as I have been digging a bit more in those crashes I do experience. I have a number of functions in my project which have a I guess a lot of existing code out there may still try to process Let's assume a function which has a version to accept So I'm a bit lost here, also because I don't know how to add an assert to detect when the wrong function is being called at compile time. |
Beta Was this translation helpful? Give feedback.
-
@TD-er Did you end up getting this resolved? I'm experiencing random crashes in strnlen_P as well. They appear random. I have a few different PlatformIO environments that enable different debug Serial.println's defined. It seems that certain builds see this happen with certain debug levels set without any real change to the functional codebase of my app... Exception 28: LoadProhibited: A load referenced a page mapped with an attribute that does not permit loads PC: 0x40222068: strnlen_P at /workdir/repo/newlib/newlib/libc/sys/xtensa/string_pgmspace.c line 51 |
Beta Was this translation helpful? Give feedback.
-
@bwjohns4 There have been some developments since. Mainly because of all these issues, and the fact that there are lots of other things demanding time, I have not worked on this since roughly the time of my last post here. |
Beta Was this translation helpful? Give feedback.
-
Is that latest toolchain more recent than PlatformIO's latest framework? Should I then use the master instead of latest PlatformIO release? |
Beta Was this translation helpful? Give feedback.
-
Basic Infos
Platform
Settings in IDE
PIO flags:
Problem Description
I have been working on this new (very nice) feature of using the 2nd heap to store some run time data.
For example I keep a queue of my events (
String
objects) and fetch them.This did cause a crash on my project which eventually pointed to the use of
strnlen_P
(mind the 'n') which is not used in my code, but it is used in theString
class.The crash was about the loadprohibited errors you typically get when trying to use strlen like calls on flash strings:
But now the really strange issue here and that's why I can't (yet) include a MCVE...
It does occur when interacting with data from the 2nd heap, but to fix it I have to alter calls to functions which may expect a
const String&
which and being called with a flash string.For example when serving a web page, I enter code like this:
This already includes the 'fix' to wrap the flash string in a
String
constructor. But this functioncheckRAM
does have a function definition which accepts a flash string.So it 'feels' like this web serving call is happening right when data was being stored in my event queue which does swap the heap to the iram heap.
I will try to see if I can also fix it by explicitly switching to the DRAM heap when serving a web page, but it does feel a bit tricky to say the least.
Also I am having a hard time in getting this 2nd heap thing to work reliable.
It feels like random if the 2nd heap is active or not when compiling it using PlatformIO.
Using
PIO_FRAMEWORK_ARDUINO_MMU_CACHE16_IRAM48_SECHEAP_SHARED
as PIO define does randomly have it compiled in.When it doesn't work, calling
ESP.getFreeHeap()
returns the same value always regardless the selected heap.Beta Was this translation helpful? Give feedback.
All reactions