-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sched_ext: Rustland crashing and restarting on a laptop. #490
Comments
@0x006E thanks for reporting this, it'd be really helpful if you could grab the stdout of scx_rustland and report the trace when the error happens, unfortunately dmesg doesn't say much. Also keep in mind that scx_rustland at the moment should be considered mostly a proof-of-concept / experimental scheduler and you should use scx_bpfland for better performance. But it's still interesting to investigate and understand the reason of these stalls. of course :) |
I'll add the trace too. Is there any other env needed for backtrace or anything like that to add anymore details? |
The trace shown to stdout should be enough for now to better understand what's happening, thanks! |
I have not been able to get traceout, I waited for about 30 minutes or so doing my work, but it never crashed but the system hang like crazy, even sysrq didn;t work. The stdout does not look interesting, I'll add it here. I am adding the logs when the system fully freezed: I'll try after a reboot to check if it crashes. Edit: I didn't know Ctrl + \ is used to trigger a core dump |
I'm a bit confused here. When you say "the system hang" you mean that it's completely unresponsive? IIUC you can stop scx_rustland, and when you do so the system immediately recovers and it's responsive again? With scx_rustland running if a task isn't scheduled for 5sec the sched_ext watchdog should automatically kick out the scheduler and restore the default linux scheduler (and in this case you should see scx_rustland spitting a trace). But apparently you don't see this, that's why I'm confused. |
Sorry about that, the system was unresponsive, and I couldn't close or do anything for about 3 seconds. Anyway I finally got a crash. The crash happened when I was just browsing the web. This is the trace.
Edit 1: Got another crash
|
Thanks that's really useful. I see lots of tasks in the traces that are not dispatched (meaning that they are probably sitting in the user-space scheduler's queue), I'm wondering if the user-space scheduler is blocked by something and it's unable to dispatch the tasks. Like in the second trace it looks like the system is trying to reclaim memory. When this happens if the user-space scheduler hits a page fault, we may have a deadlock. This should be prevented by setting When this problem happens is your system under memory pressure conditions (almost out of free memory / |
I switched to bpfland, but it too does freeze the system sometimes, but has not crash yet.
shows this. If you can give me something to test, I can do that too. |
Interesting that you're experiencing system freezes also with bpfland, does it happen with other sched-ext schedulers as well? like scx_lavd, scx_rusty? |
I haven't tried others, I can try and report if these exist in those schedulers too. A memory stress does not make the scheduler crash. Only makes the system unresponsive. I'm also using zram if that's relevant too. 2024-08-13.15-10-55.mp4 |
I think the memory pressure condition is slowing down the user-space scheduler (somehow). That's probably why the system becomes unresponsive. But this shouldn't happen with other schedulers that don't have a relevant user-space counterpart, such as bpfland. I'll try to reproduce this on my side. Thanks! |
I was wrong about the bpfland scheduler. It doesn't make the system unresponsive even on low memory conditions. But bpfland crashes sometimes (dmesg still shows scheduler stall). I cannot capture a trace because the crash happens randomly and is very infrequent unlike rustland. |
Ok thanks for the update, makes sense, let me know if you can catch a trace with bpfland and I'll take a look. On the rustland side I'll do some tests locally under memory pressure conditions. |
I have the same problems, under high loads scx_rusty on musl fails, this happened during compilation, as can be seen from llvm-objcopy, but I’m not sure that this is one of the most difficult stages, in theory, before the single-threaded lld stage there was a multi-threaded one, clogging all processor cores, which often takes a very long time
|
|
Hi,
I'm on a laptop with i5-1240P. I'm using scx_rustland. I'm frequently having the issue where scx_rustland crashes? Atleast dmesg suggests that. When opening apps, sometimes the whole system just freezes for a second and the CPU is getting super hot.
This maybe due to nvidia? This is the conclusion what I arrived at after posting this in the chaotic nix support group.
This is my dmesg:
https://bpa.st/2XDQ
This is my corresponding nix config
https://github.com/0x006E/dotfiles
The text was updated successfully, but these errors were encountered: