Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working through Segmentation Fault #2

Open
tillig opened this issue Feb 2, 2019 · 10 comments
Open

Working through Segmentation Fault #2

tillig opened this issue Feb 2, 2019 · 10 comments

Comments

@tillig
Copy link
Contributor

tillig commented Feb 2, 2019

Over in the C# wrapper lib, you noted you'd gotten things working and then started hitting segmentation faults.

I made the same changes you made (basically), got a P/Invoke solution working, went to get a cup of coffee, and when I tried it again - Segmentation Fault. Rebooted the Pi... seg fault. I'm legitimately stumped as to why. I run the program and as soon as it tries to write to the LED strip, segmentation fault.

I'm curious if you've tried this SWIG-based wrapper. It feels heavy-handed, but the only thing I can think of is that there's some sort of incorrectly handled memory pinning or marshaling going on.

Something super weird: If I add some Console.WriteLine statements in before I try rendering to the LED strip, I can actually get some of the LEDs to work. Out of a strip of 540 LEDs, I can get to about LED 50 before the seg fault hits - and the only difference is the Console.WriteLine. I have no technical proof, but my gut is saying it has something to do with the memory allocated for the string getting written to the console. Like it's just enough to shift things around and allow the program to work.

But why it stopped originally I honestly can't say. Worked, then just didn't work anymore.

@tillig
Copy link
Contributor Author

tillig commented Feb 2, 2019

OK, verified, I added some Console.WriteLine statements to the WS281x class, like before calling the native methods to do the initialize and such - basically console debugging statements to see if I can figure out where the seg fault is happening...

...and now I can't get it to happen again. I went to get another cup of coffee - still works. Rebooted the Pi, tried again, still works.

I hate to cargo-cult develop this thing, but this is one of the most black magical things I've seen in a long time. Is it something where, I dunno, like, the GC needs to allocate enough heap to properly allow the interop to function and it doesn't do that until enough managed memory is allocated? I'm stabbing in the dark, but I figured you may be able to get past some of your seg faults if you just try temporarily throwing some Console.WriteLine in there. Maybe. Or not. Rambling now. So confused.

@tillig
Copy link
Contributor Author

tillig commented Feb 2, 2019

On changing my demo program to do new things beyond just setting the lights to a specific color, I'm back to square one with the random seg fault. Console.WriteLine isn't saving me.

@DanielSSilva
Copy link
Contributor

Hey there @tillig . Thanks for reaching out. Yes I've had some weird behaviour with the led stripes. It was more like proof of concept that it would work with PowerShell, although I would like to get back to this. The seg fault are random for me as well. Surely it has do to with some wrong memory allocation/frees, but since it calls the c++ wrapper, I don't really know where that is. The marshaling thing was new to me, something that I haven't worked with before, so it was more like try/fail/try again approach.
Regarding your " Out of a strip of 540 LEDs, I can get to about LED 50 before the seg fault hit". If I recall correctly, I'm not sure that pi can handle those 540 LEDs (maybe I'm mixing the pi capacity with the power supply capacity, not sure). When I have time, I will come back to this project. If in the meantime you find something, please share!
In the meantime, you might want to check out the unosquare/raspberryio implementation of a led strip (might not be the same, but if the datasheet is similar, it can be used I guess).

PS: regarding the SWIG-based wrapper, I haven't tried it

@tillig
Copy link
Contributor Author

tillig commented Feb 4, 2019

Hey, yeah, sorry for the weird stream of consciousness thing going on here. This whole thing is nuts.

The length of the LED strip isn't an issue; I have a decent power supply and have power inserted at intervals along it. I was using this TCP server controller that uses the same ws281x library and that mostly works. I did find there's a buffer overflow in the TCP server part where there's a maximum number of commands that can be issued at a time or the whole thing hangs; that's why I was looking into P/Invoke.

As it turns out, I think what we're seeing is actually a bug in the .NET Core CLR on ARM. If you search for segmentation fault .net core it appears segmentation faults on ARM processors with Debian (i.e., Raspberry Pi with Raspbian) happen a lot. They appear to still be working out the kinks.

I've started diving in using strace and gdb to see what's going on and the error, when it happens, is consistently in garbage collector code. I think it has something to do with marshaling heap memory between managed and unmanaged code but I haven't totally narrowed it down.

I've created a minimal repro and submitted an issue to the coreclr repo on it. I still intend on trying to get a core dump to debug and figure out where exactly things are going wrong, but if you want to follow along, there are the links.

I'll close this for now. You can subscribe to that issue if you want, or not, your call. If you do figure something out, let me know.

I guess worst case scenario, I could try to write a fully managed version of that light controller library using something like System.Device.Gpio though I'm not sure if that, too, may result in a segfault. It'd be nice to not try to reverse engineer the protocol for the LED strips; it seemed pretty complicated.

@tillig tillig closed this as completed Feb 4, 2019
@DanielSSilva
Copy link
Contributor

We can leave this open, since I'm interested in this and this is not solved. Also wanted to say awesome work! I will definitely follow this

@DanielSSilva DanielSSilva reopened this Feb 4, 2019
@tillig
Copy link
Contributor Author

tillig commented Feb 4, 2019

Sounds good. If/when I get it all resolved, I'll post back.

@DanielSSilva
Copy link
Contributor

@tillig I've been following the other issue and saw that you worked through it. Awesome job there! Do you want to make a PR so that this gets fixed?

@tillig
Copy link
Contributor Author

tillig commented May 15, 2019

I finally got this working. It took a lot of work and help from some folks in the CoreCLR repo who know interop waaaay better than me, but it's working.

You can see the resolved version with a working example in my reproduction repository. In order to get it functioning I had to do three things that differ from the C# wrapper library:

  • Switch ws2811_t to be a class. If you don't do this, the struct value gets boxed and the memory pin isn't right.
  • Ensure both channel structures (ws2811_channel_t) are present in the ws2811_t class so it matches the one outlined in the unmanaged library (which has an array of two channels).
  • Pass the _ws2811Handle.AddrOfPinnedObject() as an IntPtr to the interop methods instead of a ref to the ws2811 object instance. Doing this ensures the native library gets the object memory location and not the location of the .NET object method table.

Again, it's probably easiest to see this in action in my repo. I was able to run this on a Raspberry Pi 3B and see my light strand do a nice red color wipe, exactly as I hoped, with no seg faults or memory marshaling issues.

I'll leave the issue open here in case you want to chat more, otherwise... there you go!

@tillig
Copy link
Contributor Author

tillig commented May 15, 2019

Hahahaha jinx! Looks like I was typing while you were, too.

Yeah, I can PR it. Probably won't be tonight, but I can get something together in the next few days.

@DanielSSilva
Copy link
Contributor

Awesome job mate! Really glad it's finally working. Thank you for your patience and persistence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@tillig @DanielSSilva and others